Live video streaming webrtc? - html

So I want to be able to let one client distribute live video feed to alot of other clients only watching. Is it possible to use WebRTC for this? Or will I basically have to go through a service like ustream or whatever.

It is possible, with a caveat. One client can establish many outgoing P2P connections to every viewer and stream video to them directly. However, for more than a very small handful of viewers this will quickly saturate the origin's bandwidth and possibly CPU. You would not be able to serve many viewers this way; however this would work entirely without middleman.*
* Safe for the WebRTC connection negotiation server.
To be able to serve a larger number of viewers you'll have to use a centralised distribution server. The source would send exactly one video stream to that server, and the server would stream it to anyone who's interested. This still requires that server to have a beefy CPU and a lot of bandwidth; but this is more realistic to scale than the client side.
You may need to spend a lot of money on such a server; have a look at c3.xlarge and better instances at AWS to get an idea. Using an established, cheap infrastructure like ustream may indeed be the more realistic option.

Related

Basic architecture to serve, stream and consume large audio files to minimize client-side resource consumption and latency

I am trying to build a web application which will need to have audio streaming functionality implemented in some way. Just to give you guys some context: It is designed to be a purely auditive experience/game/idkhowtocallit with lots of different sound assets varying in length and thus file size. The sound assets to be provided will consist of ambient sounds, spoken bits of conversation, but also long music sets (up to a couple of hours). Why I think I won't be able to just host these audio files on some server or CDN and serve them from there is, because the sound assets will need to be fetched and played dynamically (depending on user interaction) and as instantly as possible.
Most importantly, consuming larger files (like the music sets and long ambient loops) as a whole doesn't seem to be client-friendly at all to me (used data consumption on mobile networks and client-side memory usage).
Also, without any buffering or streaming mechanism, the client won't be able to start playing these files before they are downloaded completely, right? Which would add the issue of high latencies.
I've tried to do some online research on how to properly implement a good infrastructure to stream bigger audio files to clients on the server side and found HLS and MPEG-DASH. I have some experience with consuming HLS players with web players and if I understand it correctly, I would use some sort of one-time transformation process (on or after file upload) to split up the files into chunks and create the playlist and then just serve these files via HTTP. From what I understand the process should be more or less the same for MPEG-DASH. My issue with these two techniques is that I couldn't really find any documentation on how to implement JavaScript/TypeScript clients (particularly using the Web Audio API) without reinventing the wheel. My best guess would be to use something like hls.js and bind the HLS streams to freshly created audio elements and use these elements to create AudioSources in my Web Audio Graph. How far off am I? I'm trying to get at least an idea of a best practice.
To sum up what I would really appreciate to get some clarity about:
Would HLS or MPEG-DASH really be the way to go or am I missing a more basic chunked file streaming mechanism with good libraries?
How - theoretically - would I go about limiting the amount of chunks downloaded in advance on the client side to save client-side resources, which is one of my biggest concerns?
I was looking into hosting services as well, but figured that most of them are specialized in hosting podcasts (fewer but very large files). Has anyone an opinion about whether I could use these services to host and stream possibly 1000s of files with sizes ranging from very small to rather large?
Thank you so much in advance to everyone who will be bothered with helping me out. Really appreciate it.
Why I think I won't be able to just host these audio files on some server or CDN and serve them from there is, because the sound assets will need to be fetched and played dynamically (depending on user interaction) and as instantly as possible.
Your long running ambient sounds can stream, using a normal HTMLAudioElement. When you play them, there may be a little lag time before they start since they have to begin streaming, but note that the browser will generally prefetch the metadata and maybe even the beginning of the media data.
For short sounds where latency is critical (like one-shot user interaction sound effects), load those into buffers with the Web Audio API for playback. You won't be able to stream them, but they'll play as instantly as you can get.
Most importantly, consuming larger files (like the music sets and long ambient loops) as a whole doesn't seem to be client-friendly at all to me (used data consumption on mobile networks and client-side memory usage).
If you want to play the audio, you naturally have to download that audio. You can't play something you haven't loaded in some way. If you use an audio element, you won't be downloading much more than what is being played. And, that downloading is mostly going to occur on-demand.
Also, without any buffering or streaming mechanism, the client won't be able to start playing these files before they are downloaded completely, right? Which would add the issue of high latencies.
If you use an audio element, the browser takes care of all the buffering and what not for you. You don't have to worry about it.
I've tried to do some online research on how to properly implement a good infrastructure to stream bigger audio files to clients on the server side and found HLS and MPEG-DASH.
If you're only streaming a single bitrate (which for audio is usually fine) and you're not streaming live content, then there's no point to HLS or DASH here.
Would HLS or MPEG-DASH really be the way to go or am I missing a more basic chunked file streaming mechanism with good libraries?
The browser will make ranged HTTP requests to get the data it needs out of the regular static media file. You don't need to do anything special to stream it. Just make sure your server is configured to handle ranged requests... most any should be able to do this right out of the box.
How - theoretically - would I go about limiting the amount of chunks downloaded in advance on the client side to save client-side resources, which is one of my biggest concerns?
The browser does this for you if you use an audio element. Additionally, data saving settings and the detected connectivity speed may impact whether or not the browser pre-fetches. The point is, you don't have to worry about this. You'll only be using what you need.
Just make sure you're compressing your media as efficiently as you can for the required audio quality. Use a good codec like Opus or AAC.
I was looking into hosting services as well, but figured that most of them are specialized in hosting podcasts (fewer but very large files). Has anyone an opinion about whether I could use these services to host and stream possibly 1000s of files with sizes ranging from very small to rather large?
Most any regular HTTP CDN will work just fine.
One final note for you... beware of iOS and Safari. Thanks to Apple's restrictive policies, all browsers under iOS are effectively Safari. Safari is incapable of playing more than one audio element at a time. If you use the Web Audio API you have more flexibility, but the Web Audio API has no real provision for streaming. You can use a media element source node, but this breaks lock screen metadata and outright doesn't work on some older versions of iOS. TL;DR; Safari is all but useless for audio on the web, and Apple's business practices have broken any alternatives.

Video and audio stream - server to clients only

Is there a way to stream a video and audio on a website just to the clients, using a camera installed on the server - for instance, like youtube does ?
I've started reading webrtc, but if I use webrtc I should create a stun/turn server and other things, which for one way stream I think is not necessary (this is just my understanding of the things..) because I don't need anything from the clients, literally, neither their video, or audio..
So is there a way to achieve this using html5, streaming just in one direction:
server (camera) -> clients
Is there something about this out there, or should I stick with webrtc ?
I'm going to explain a possible solution for this scenario, there might be others, but I hope mine gives you a rough idea of how you could do it and a start point to explore more about the amazing possibilities of WebRTC. Please let me know if something is not understood.
So, WebRTC is a free, open project that provides browsers and mobile applications with Real-Time Communications (RTC) capabilities via simple APIs. Sweet, that is: WebRTC has a quite good browser support (not in every browser though, Safari just started supporting it a month ago with Safari 11). But in this case we want to use WebRTC in the server side. At the end of the day we can still think about peer-to-peer real time communication, where one of our peers is the server.
I don't know if you are familiar with Node.js, but I recommend you to write your Server app with it (<3 Javascript!):
There are a few libraries that wrap WebRTC functionality to be used in the server side, like node-webrtc and node-rtc-peer-connection.
But I recommend you to to take a look at electron-werbrtc, since
the others might be using deprecated methods or be incomplete.
electron-webrtc runs a headless Electron client in the background to
use Chromium's built-in WebRTC implementation. So with it you should
be able to access the Camera in your server and create a stream to be
served to the other peer (the browser).
All above would be the WebRTC related tasks, in this case: streaming video peer(server)-to-peer(browser).
Now, let's talk about signaling process, stun and turn.
Signaling: imagine now a scenario peer-to-peer with 2 browsers, they want to establish a direct connection and stream video and audio between each other. But they don't know each other, like if I don't know your home address, I can't send you a letter. So they need a service that helps them know each other, so they can have the other's IP. This should be done by what is called "a signaling server". If somehow you know the other peer IP, you wouldn't need a signaling server.
STUN/TURN: the scheme above works perfectly in a local area network where each peer has its own IP address and there are no firewalls and routers between them. But otherwise, you can have peers behind a NAT or firewalls, and then your signaling server won't be able to make both peers to discover themselves. If you have peers behind a NAT, you'll need a STUN server, and if you have peers behind firewalls you'll need a TURN server. This is a bit simplified, but I just want you to have the general picture of when you might need STUN/TURN servers.
To better understand Signaling, STUN and TURN, there is a very graphic article that explains them perfectly.
Now, for your scenario:
I think you prob don't need STUN/TURN servers and also you prob don't need to implement the signaling process, because the browsers that are supposed to receive the stream from the server will know that server address, right? So they can establish a WebRTC connection with it.
EDIT: it is likely that you will need to implement some sort of handshake between the server and the clients (browsers), so this will be the signaling process. This is not part of WebRTC and this is why you need to implement it yourself. As I said, it is the way 2 peers can discover each other, but they also exchange information as their local media conditions, like codecs, resolutions they can handle, etc. For your case, your signaling server could be hosted in the same server you use to strea: you can build a small node.js app that runs there and that manages all the signaling process easily, it is not a big deal. I recommend you to read this article, and specially the section "How can I build a signaling service?". In general all WebRTC articles from that site are very helpful.
Does this make sense to you? I think with it you can start digging a little bit more and see if with this is enough or you need to implement more stuff. Hope it helps!

Low latency (< 2s) live video streaming HTML5 solutions?

With Chrome disabling Flash by default very soon I need to start looking into flash/rtmp html5 replacement solutions.
Currently with Flash + RTMP I have a live video stream with < 1-2 second delay.
I've experimented with MPEG-DASH which seems to be the new industry standard for streaming but that came up short with 5 second delay being the best I could squeeze from it.
For context, I am trying to allow user's to control physical objects they can see on the stream, so anything above a couple of seconds of delay leads to a frustrating experience.
Are there any other techniques, or is there really no low latency html5 solutions for live streaming yet?
Technologies and Requirements
The only web-based technology set really geared toward low latency is WebRTC. It's built for video conferencing. Codecs are tuned for low latency over quality. Bitrates are usually variable, opting for a stable connection over quality.
However, you don't necessarily need this low latency optimization for all of your users. In fact, from what I can gather on your requirements, low latency for everyone will hurt the user experience. While your users in control of the robot definitely need low latency video so they can reasonably control it, the users not in control don't have this requirement and can instead opt for reliable higher quality video.
How to Set it Up
In-Control Users to Robot Connection
Users controlling the robot will load a page that utilizes some WebRTC components for connecting to the camera and control server. To facilitate WebRTC connections, you need some sort of STUN server. To get around NAT and other firewall restrictions, you may need a TURN server. Both of these are usually built into Node.js-based WebRTC frameworks.
The cam/control server will also need to connect via WebRTC. Honestly, the easiest way to do this is to make your controlling application somewhat web based. Since you're using Node.js already, check out NW.js or Electron. Both can take advantage of the WebRTC capabilities already built in WebKit, while still giving you the flexibility to do whatever you'd like with Node.js.
The in-control users and the cam/control server will make a peer-to-peer connection via WebRTC (or TURN server if required). From there, you'll want to open up a media channel as well as a data channel. The data side can be used to send your robot commands. The media channel will of course be used for the low latency video stream being sent back to the in-control users.
Again, it's important to note that the video that will be sent back will be optimized for latency, not quality. This sort of connection also ensures a fast response to your commands.
Video for Viewing Users
Users that are simply viewing the stream and not controlling the robot can use normal video distribution methods. It is actually very important for you to use an existing CDN and transcoding services, since you will have 10k-15k people watching the stream. With that many users, you're probably going to want your video in a couple different codecs, and certainly a whole array of bitrates. Distribution with DASH or HLS is easiest to work with at the moment, and frees you of Flash requirements.
You will probably also want to send your stream to social media services. This is another reason why it's important to start with a high quality HD stream. Those services will transcode your video again, reducing quality. If you start with good quality first, you'll end up with better quality in the end.
Metadata (chat, control signals, etc.)
It isn't clear from your requirements what sort of metadata you need, but for small message-based data, you can use a web socket library, such as Socket.IO. As you scale this up to a few instances, you can use pub/sub, such as Redis, to distribution messaging throughout the servers.
To synchronize the metadata to the video depends a bit on what's in that metadata and what the synchronization requirement is, specifically. Generally speaking, you can assume that there will be a reasonable but unpredictable delay between the source video and the clients. After all, you cannot control how long they will buffer. Each device is different, each connection variable. What you can assume is that playback will begin with the first segment the client downloads. In other words, if a client starts buffering a video and begins playing it 2 seconds later, the video is 2 seconds behind from when the first request was made.
Detecting when playback actually begins client-side is possible. Since the server knows the timestamp for which video was sent to the client, it can inform the client of its offset relative to the beginning of video playback. Since you'll probably be using DASH or HLS and you need to use MCE with AJAX to get the data anyway, you can use the response headers in the segment response to indicate the timestamp for the beginning the segment. The client can then synchronize itself. Let me break this down step-by-step:
Client starts receiving metadata messages from application server.
Client requests the first video segment from the CDN.
CDN server replies with video segment. In the response headers, the Date: header can indicate the exact date/time for the start of the segment.
Client reads the response Date: header (let's say 2016-06-01 20:31:00). Client continues buffering the segments.
Client starts buffering/playback as normal.
Playback starts. Client can detect this state change on the player and knows that 00:00:00 on the video player is actualy 2016-06-01 20:31:00.
Client displays metadata synchronized with the video, dropping any messages from previous times and buffering any for future times.
This should meet your needs and give you the flexibility to do whatever you need to with your video going forward.
Why not [magic-technology-here]?
When you choose low latency, you lose quality. Quality comes from available bandwidth. Bandwidth efficiency comes from being able to buffer and optimize entire sequences of images when encoding. If you wanted perfect quality (lossless for each image) you would need a ton (gigabites per viewer) of bandwidth. That's why we have these lossy codecs to begin with.
Since you don't actually need low latency for most of your viewers, it's better to optimize for quality for them.
For the 2 users out of 15,000 that do need low latency, we can optimize for low latency for them. They will get substandard video quality, but will be able to actively control a robot, which is awesome!
Always remember that the internet is a hostile place where nothing works quite as well as it should. System resources and bandwidth are constantly variable. That's actually why WebRTC auto-adjusts (as best as reasonable) to changing conditions.
Not all connections can keep up with low latency requirements. That's why every single low latency connection will experience drop-outs. The internet is packet-switched, not circuit-switched. There is no real dedicated bandwidth available.
Having a large buffer (a couple seconds) allows clients to survive momentary losses of connections. It's why CD players with anti-skip buffers were created, and sold very well. It's a far better user experience for those 15,000 users if the video works correctly. They don't have to know that they are 5-10 seconds behind the main stream, but they will definitely know if the video drops out every other second.
There are tradeoffs in every approach. I think what I have outlined here separates the concerns and gives you the best tradeoffs in each area. Please feel free to ask for clarification or ask follow-up questions in the comments.

When using WebRTC, is a peer-to-peer architecture redundant to build a video chat service like Skype?

We're playing around with WebRTC and trying to understand its benefits.
One reason Skype can serve hundreds of millions of people is because of its decentralized, peer-to-peer architecture, which keeps server costs down.
Does WebRTC allow people to build a video chat application similar to Skype in that the architecture can be decentralized (i.e., video streams are not routed from a broadcaster through a central server to listeners but rather routed directly from broadcaster to listener)?
Or, put another way, does WebRTC allow someone to essentially replicate the benefits of a P2P architecture similar to Skype's?
Or do you still need something similar to Skype's P2P architecture?
Yes, that's basically what WebRTC does. Calls using the getPeerConnection() API don't send voice/video data through a centralized server, but rather use firewall traversal protocols like ICE, STUN and TURN to allow a direct, peer-to-peer connection. However, the initial call setup still requires a server (most likely something running a WebSocket implementation, but it could be anything that you can figure out how to get JavaScript to talk to), so that the two clients can figure out that they're both online, signal that they want to connect, and then figure out how to do it (this is where the ICE/STUN/TURN bit comes in).
However, there's more to Skype's P2P architecture than just passing voice/video data back and forth. The majority of Skype's IP isn't in the codecs or protocols (much of which they licensed from Global IP Solutions, which Google purchased two years ago and then open-sourced, and which forms of the basis of Chrome's WebRTC implementation). Skype's real IP is all in the piece of WebRTC which still depends on a server: figuring out which people are online, and where they are, and how to get a hold of them, and doing that in a massively decentralized fashion. (See here for some rough details.) I think that you could probably use the DataStream portion of the getPeerConnection() API to do that sort of thing, if you were really, really smart - but it would be complicated, and would most likely stomp on a few Skype patents. Unless you want to be really, really huge, you'd probably just want to run your own centralized presence and location servers and handle all that stuff through standard WebSockets.
I should note that Skype's network architecture has changed since it was created; it no longer (from what I hear) uses random users as supernodes to relay data from client 1 to client 2; it didn't scale well and caused rampant variability in results (and annoyed people who had non-firewalled connections and bandwidth).
You definitely can build something SKype-like with WebRTC - and more. :-)

Do HTML WebSockets maintain an open connection for each client? Does this scale?

I am curious if anyone has any information about the scalability of HTML WebSockets. For everything I've read it appears that every client will maintain an open line of communication with the server. I'm just wondering how that scales and how many open WebSocket connections a server can handle. Maybe leaving those connections open isn't a problem in reality, but it feels like it is.
In most ways WebSockets will probably scale better than AJAX/HTML requests. However, that doesn't mean WebSockets is a replacement for all uses of AJAX/HTML.
Each TCP connection in itself consumes very little in terms server resources. Often setting up the connection can be expensive but maintaining an idle connection it is almost free. The first limitation that is usually encountered is the maximum number of file descriptors (sockets consume file descriptors) that can be open simultaneously. This often defaults to 1024 but can easily be configured higher.
Ever tried configuring a web server to support tens of thousands of simultaneous AJAX clients? Change those clients into WebSockets clients and it just might be feasible.
HTTP connections, while they don't create open files or consume port numbers for a long period, are more expensive in just about every other way:
Each HTTP connection carries a lot of baggage that isn't used most of the time: cookies, content type, conetent length, user-agent, server id, date, last-modified, etc. Once a WebSockets connection is established, only the data required by the application needs to be sent back and forth.
Typically, HTTP servers are configured to log the start and completion of every HTTP request taking up disk and CPU time. It will become standard to log the start and completion of WebSockets data, but while the WebSockets connection doing duplex transfer there won't be any additional logging overhead (except by the application/service if it is designed to do so).
Typically, interactive applications that use AJAX either continuously poll or use some sort of long-poll mechanism. WebSockets is a much cleaner (and lower resource) way of doing a more event'd model where the server and client notify each other when they have something to report over the existing connection.
Most of the popular web servers in production have a pool of processes (or threads) for handling HTTP requests. As pressure increases the size of the pool will be increased because each process/thread handles one HTTP request at a time. Each additional process/thread uses more memory and creating new processes/threads is quite a bit more expensive than creating new socket connections (which those process/threads still have to do). Most of the popular WebSockets server frameworks are going the event'd route which tends to scale and perform better.
The primary benefit of WebSockets will be lower latency connections for interactive web applications. It will scale better and consume less server resources than HTTP AJAX/long-poll (assuming the application/server is designed properly), but IMO lower latency is the primary benefit of WebSockets because it will enable new classes of web applications that are not possible with the current overhead and latency of AJAX/long-poll.
Once the WebSockets standard becomes more finalized and has broader support, it will make sense to use it for most new interactive web applications that need to communicate frequently with the server. For existing interactive web applications it will really depend on how well the current AJAX/long-poll model is working. The effort to convert will be non-trivial so in many cases the cost just won't be worth the benefit.
Update:
Useful link: 600k concurrent websocket connections on AWS using Node.js
Just a clarification: the number of client connections that a server can support has nothing to do with ports in this scenario, since the server is [typically] only listening for WS/WSS connections on one single port. I think what the other commenters meant to refer to were file descriptors. You can set the maximum number of file descriptors quite high, but then you have to watch out for socket buffer sizes adding up for each open TCP/IP socket. Here's some additional info: https://serverfault.com/questions/48717/practical-maximum-open-file-descriptors-ulimit-n-for-a-high-volume-system
As for decreased latency via WS vs. HTTP, it's true since there's no more parsing of HTTP headers beyond the initial WS handshake. Plus, as more and more packets are successfully sent, the TCP congestion window widens, effectively reducing the RTT.
Any modern single server is able to server thousands of clients at once. Its HTTP server software has just to be is Event-Driven (IOCP) oriented (we are not in the old Apache one connection = one thread/process equation any more). Even the HTTP server built in Windows (http.sys) is IOCP oriented and very efficient (running in kernel mode). From this point of view, there won't be a lot of difference at scaling between WebSockets and regular HTTP connection. One TCP/IP connection uses a little resource (much less than a thread), and modern OS are optimized for handling a lot of concurrent connections: WebSockets and HTTP are just OSI 7 application layer protocols, inheriting from this TCP/IP specifications.
But, from experiment, I've seen two main problems with WebSockets:
They do not support CDN;
They have potential security issues.
So I would recommend the following, for any project:
Use WebSockets for client notifications only (with a fallback mechanism to long-polling - there are plenty of libraries around);
Use RESTful / JSON for all other data, using a CDN or proxies for cache.
In practice, full WebSockets applications do not scale well. Just use WebSockets for what they were designed to: push notifications from the server to the client.
About the potential problems of using WebSockets:
1. Consider using a CDN
Today (almost 4 years later), web scaling involves using Content Delivery Network (CDN) front ends, not only for static content (html,css,js) but also your (JSON) application data.
Of course, you won't put all your data on your CDN cache, but in practice, a lot of common content won't change often. I suspect that 80% of your REST resources may be cached... Even a one minute (or 30 seconds) CDN expiration timeout may be enough to give your central server a new live, and enhance the application responsiveness a lot, since CDN can be geographically tuned...
To my knowledge, there is no WebSockets support in CDN yet, and I suspect it would never be. WebSockets are statefull, whereas HTTP is stateless, so is much easily cached. In fact, to make WebSockets CDN-friendly, you may need to switch to a stateless RESTful approach... which would not be WebSockets any more.
2. Security issues
WebSockets have potential security issues, especially about DOS attacks. For illustration about new security vulnerabilities , see this set of slides and this webkit ticket.
WebSockets avoid any chance of packet inspection at OSI 7 application layer level, which comes to be pretty standard nowadays, in any business security. In fact, WebSockets makes the transmission obfuscated, so may be a major breach of security leak.
Think of it this way: what is cheaper, keeping an open connection, or opening a new connection for every request (with the negotiation overhead of doing so, remember it's TCP.)
Of course it depends on the application, but for long-term realtime connections (e.g. an AJAX chat) it's far better to keep the connection open.
The max number of connections will be capped by the max number of free ports for the sockets.
No it does not scale, gives tremendous work to intermediate routes switches. Then on the server side the page faults (you have to keep all those descriptors) are reaching high values, and the time to bring a resource into the work area increases. These are mostly JAVA written servers and it might be faster to hold on those gazilions of sockets then to destroy/create one.
When you run such a server on a machine any other process can't move anymore.