I would like to build a web application that lets two peers see and hear each other using video and audio streaming with HTML5 and no plugins (except for IE, that I pretend to use getUserMediajs to use a flash fallback).
I also want to transmit that data using NodeJS but I have no idea where to start. In an example:
Peer A <---> Node JS <---> Peer B
I'm interested in this Peer 2 Server 2 Peer approach instead of a Peer 2 Peer solution like PeerJS because:
1) I think it will be more compatible will all browsers. If this is not entirely true, please let me know.
2) PeerJS (which I'm not interested in) relies on black magic STUN-TURN-ICE signaling for some cases. I read somewhere that only the 70% of the connections are suitable for this kind of transmission, and I can not afford a loss of the 30%. Again let me know if this is no entirely true.
I have already played around with socket.io and know the concepts of getUserMedia() to get user's webcam, but don't know how to link that with socket.io and transmit that to the other client.
The browser compatibility has nothing to do with adding a server side component. You could be p2p, or p2s2p, if what you send is not recognized by the receiving browser, it won't work.
ICE is mandatory for webrtc, you can't do without, period. by default, you can only connect to computers in the same network (hosts candidates). If you provide a STUN server, you will be able to connect in 70% of the cases all together, much less in enterprise context. http://webrtcstats.com/webrtc-revolution-in-progress/ has the latest stats from some vendors. You can see that for social sites, as of june 2014, 92% of the calls can work through firewalls and NAT using simple STUN. The remaining of the called needed to be relayed through a TURN server. You have a lot of Free STUN server providers out there, this is the minimum you should use.
webRTC for Desktop IE and Safari.
While flash callbacks are interesting (read, easy) they pause two problems:
They do not generate video stream compatible with peer connection or with HTML5. Not being compatible with peer connection means, you cannot send the images or the video, but only use it locally. Not being compatible with HTML5 means you cannot use the generated image and video in a element, and you do not have anyway simple way to render it outside of a flash plugin element. In the case of the shim you point to, they copy over from the flash plugin to HTML each single frame, and they mention in the read me that this is too computationally extensive to be used for lived video.
flash use different protocols (RTMP, RTMFP, ..) and codecs from webrtc and they are not mutually interoperable. You would need to maintain both separately or to have a complicated, dual use infrastructure to deal with it. OpenClove is a vendor that propose such dual purpose infrastructure for example.
Another solution is to install on Desktop IE and Safari a webRTC plugin (not flash), which implements "pure" webRTC. In which case, you can directly interoperate with chrome, firefox, opera, and any other browser which implements natively webRTC 1.0
We propose such a plugin, for free (no cost) and for all (not vendor specific) here
Whatever you do, you need WebRTC support on the browser ("no plugins"). So, the "it will be more compatible will all browsers" is a moot point because browser support
Related
We develop an IP camera product which streams H.264/MPEG4/MJPEG video via RTSP/UDP. It has a web interface, currently we use the VLC Firefox plugin to allow viewing of the live RTSP stream in the browser but Firefox are dropping support for NPAPI plugins so that's currently a dead end.
The camera itself is a relatively low-powered ARM SoC (think Raspberry Pi level) so we don't have vast spare resource to do things like transcode streams on-the-fly on the board.
The main purpose is to check the video stream is working correctly from the web interface, so streaming a new stream (or transcoding it) in some other format/transport/streaming engine is less desirable than being able to somehow play the original RTSP stream directly. In regular use the video is streamed via RTSP into a VMS server so that's not up for alteration.
In an ideal world the solution would be open-source cross-browser and happen inside an HTML5 tag, but if it works in one or more of the most popular browsers we'll take it.
I've been reading all sorts of stuff here and around the web about the brave new world of the HTML5 video tag, WebRTC, HLS, etc. and have yet to see anything that looks like a sensible and complete solution that doesn't involve some extra conversion/transcoding/re-streaming, often by some half-supported framework or an extra server in the middle which is not a viable solution.
I haven't yet found a proper description of what may or may not be required to "convert" our stream to whatever-html5-video-likes, whether it's just a slightly different wrapper around the same basic video stream or if there's a lot of overhead and everything is different. Likewise it's not clear if the conversion could be achieved either on-board or perhaps even in-browser using JS.
The reason for the title is that if we've got to change the way it all works we may as well aim to do whatever is considered "best practice" and reasonably future-proof as far as possible rather than some expedient fudge that might not work beyond the next round of browser updates / the next W3C press release...
I find it slightly disappointing (but perhaps not surprising) that in 2017 there seems to be no sensible way of achieving this.
Perhaps "least worst practice" would be more suitable terminology...
There are many methods you can use that don't require transcoding.
WebRTC
If you're using RTSP, you're much of the way there in sending your streams via WebRTC.
WebRTC uses SDP for declaring streams, and RTP for the transport of these streams. There are some other layers you need for setting up the WebRTC call, but none of these require particularly expensive computation. Most (all?) WebRTC clients will support H.264 decoding, many with hardware acceleration in-browser.
The easiest way to get started with WebRTC is to implement a browser-to-browser client first. Then, you can go a layer deeper with your own implementation.
WebRTC is the route I recommend to you. NAT traversal (in most cases) and P2P connectivity are built-in, so your customers won't have to remember IP addresses. Simply provide signalling services and your customers can connect directly to their cameras at home from wherever. Provide TURN servers, and they'll be able to connect even if both ends are firewalled. If you don't wish to provide such services, they're lightweight and can run directly on the camera in a mode like you have today.
Fragmented MP4 over HTTP Progressive with <video> tag
This method is much simpler than WebRTC, but totally different than what you're doing now. You can take your H.264 stream, and wrap it directly in an MP4 without transcoding. Then, it can be played in a <video> tag on a page. You'll have to implement the appropriate libs in your code, but here's an FFmpeg example that outputs to STDOUT, which you'd pipe to clients:
ffmpeg \
-i YOUR_CAMERA_HERE \
-vcodec copy \
-acodec copy \
-f mp4 \
-movflags frag_keyframe+empty_moov \
-
Others...
In your case, there's no added benefit to DASH. DASH is intended for utilizing file-based CDNs for streaming. You control the server, so there's no point in writing out files or handling HTTP requests in a file-like manner. While you can certainly use DASH with H.264 streams without transcoding, I think it's a waste of your time.
HLS is much the same. Your stream is compatible with HLS, but HLS is dropping out of favor rapidly due to its lack of flexibility on codec. DASH and HLS are essentially the same mechanism... write a bunch of media segments to a CDN and create a playlist or manifest indicating where they are.
Well, I had to do the same thing while back in a raspberry pi 3. we transcoded it on the fly using ffmpeg on the pi and used https://github.com/phoboslab/jsmpeg to stream mjpeg. then played it on the browser/ionic app.
var canvas = document.getElementById('video-canvas');
this.player = new JSMpeg.Player(this.button.url ,{canvas: canvas});
We were managing up to 4 concurrent streams with minimum delay <2-5 secs on our Pis.
But once we moved to React Native we used the RN VLC wrapper on the phones
Is there a way to stream a video and audio on a website just to the clients, using a camera installed on the server - for instance, like youtube does ?
I've started reading webrtc, but if I use webrtc I should create a stun/turn server and other things, which for one way stream I think is not necessary (this is just my understanding of the things..) because I don't need anything from the clients, literally, neither their video, or audio..
So is there a way to achieve this using html5, streaming just in one direction:
server (camera) -> clients
Is there something about this out there, or should I stick with webrtc ?
I'm going to explain a possible solution for this scenario, there might be others, but I hope mine gives you a rough idea of how you could do it and a start point to explore more about the amazing possibilities of WebRTC. Please let me know if something is not understood.
So, WebRTC is a free, open project that provides browsers and mobile applications with Real-Time Communications (RTC) capabilities via simple APIs. Sweet, that is: WebRTC has a quite good browser support (not in every browser though, Safari just started supporting it a month ago with Safari 11). But in this case we want to use WebRTC in the server side. At the end of the day we can still think about peer-to-peer real time communication, where one of our peers is the server.
I don't know if you are familiar with Node.js, but I recommend you to write your Server app with it (<3 Javascript!):
There are a few libraries that wrap WebRTC functionality to be used in the server side, like node-webrtc and node-rtc-peer-connection.
But I recommend you to to take a look at electron-werbrtc, since
the others might be using deprecated methods or be incomplete.
electron-webrtc runs a headless Electron client in the background to
use Chromium's built-in WebRTC implementation. So with it you should
be able to access the Camera in your server and create a stream to be
served to the other peer (the browser).
All above would be the WebRTC related tasks, in this case: streaming video peer(server)-to-peer(browser).
Now, let's talk about signaling process, stun and turn.
Signaling: imagine now a scenario peer-to-peer with 2 browsers, they want to establish a direct connection and stream video and audio between each other. But they don't know each other, like if I don't know your home address, I can't send you a letter. So they need a service that helps them know each other, so they can have the other's IP. This should be done by what is called "a signaling server". If somehow you know the other peer IP, you wouldn't need a signaling server.
STUN/TURN: the scheme above works perfectly in a local area network where each peer has its own IP address and there are no firewalls and routers between them. But otherwise, you can have peers behind a NAT or firewalls, and then your signaling server won't be able to make both peers to discover themselves. If you have peers behind a NAT, you'll need a STUN server, and if you have peers behind firewalls you'll need a TURN server. This is a bit simplified, but I just want you to have the general picture of when you might need STUN/TURN servers.
To better understand Signaling, STUN and TURN, there is a very graphic article that explains them perfectly.
Now, for your scenario:
I think you prob don't need STUN/TURN servers and also you prob don't need to implement the signaling process, because the browsers that are supposed to receive the stream from the server will know that server address, right? So they can establish a WebRTC connection with it.
EDIT: it is likely that you will need to implement some sort of handshake between the server and the clients (browsers), so this will be the signaling process. This is not part of WebRTC and this is why you need to implement it yourself. As I said, it is the way 2 peers can discover each other, but they also exchange information as their local media conditions, like codecs, resolutions they can handle, etc. For your case, your signaling server could be hosted in the same server you use to strea: you can build a small node.js app that runs there and that manages all the signaling process easily, it is not a big deal. I recommend you to read this article, and specially the section "How can I build a signaling service?". In general all WebRTC articles from that site are very helpful.
Does this make sense to you? I think with it you can start digging a little bit more and see if with this is enough or you need to implement more stuff. Hope it helps!
we were using the vlc plugin in Chrome to play a multicast stream (RTP Ipv6) but with the deprecation of NPAPI-Plugins we need an alternative. I was trying to search something about html5 video but nothing.
NPAPI deprecation: developer guide
Any idea?
Thanks
RTP directly to the browser is not a solution I'd use today. The implementation effort to transform a number of RTP packets to Media Segments accepted by the Media Source Extension (MSE) is rather high and perhaps it's not even doable on all browsers (chrome.sockets seems to be a way to do it at least on Chrome browsers). Plugin development for more than a single browser is a nasty business as well. Don't go there!
I am not sure if it fits your requirements but here is what I'd do:
I would setup a process that converts RTP packets to MPEG-DASH packets on a server. Coincidentally I implemented a solution like that. You can find it on Github as RTP2DASH. The example receives multiple qualities of the same stream from ffmpeg but you don't need that - a single video stream from any RTP source should be enough as you can run MPEG-DASH with just a single video stream. Doing DASH seems like a big overhead in the beginning but the advantage is that there are players working on all browsers such as the DASH-IF Reference Player (I wouldn't use that one) or Google's Shaka Player (which is included in my example) already there.
I'm looking for a way to securely deliver video to mobile devices. There are two options:
HLS in tag. This works very nicely for iOS and supports adaptive bitrate, perfect for mobile. However, is seems to only work well on iOS. There seems to be only fragmented support for it on Android. I've read that Android has officially supported it since 3.0, but on all the android devices I've tested (>3.0), HLS hasn't played back on the browser.
Progressive download in tag. This will work on iOS and Android devices fine, but the concern is that since it's just a progressive download of the video, that the user find a way to just grab that video once the browser has finished downloading it. This may be more difficult on iOS, but I'm sure it's not that hard to figure out where the browser stored the video download in a tmp folder somewhere.
Either method I'd say can be protected from deeplinking by using an expiring token approach, where the token is generated serverside with a secret key that only the content server knows about. The video request would only be valid for 5 or 10 minutes, would would kill of deeplinking.
Is anyone aware of any way around these issues? Even if I was able to prevent deeplinking, the user could still get the video itself and re-distribute. Perhaps it's just not possible?
Thanks
Rule #1 of the internet:
If you don't want someone stealing it, don't put it online.
Welcome to the circumvention arms race. Brought to you by DownloadHelper.
There's nothing you can do to stop someone who really wants to pirate your video. There are various measures, like those you mention, that make it more difficult, but someone who really wants to copy it could find a way to capture it from memory, or even just point a camera at the screen and record the playback of the video.
It's the same way you protect your car. You install a steering lock, an alarm and an engine immobiliser, and then someone comes alongs and pulls the car onto a flat-bed truck and drives away with it.
Bottom line - you can't stop a determined thief, but you can make theft more difficult so that you're not the most attractive target.
As I was reading the above I could easily get pass all these techniques pretty quickly.
For a project I can't describe too much because of nda, we created our own protocol based on a well known encryption method can't mention that either , military grade) , encoded packets on the server to the protocol, and decoded on the device.
unfortunately this isn't perfect either because a lot of mobile apps can be re-versed engineered and once you get the key game over, very easy on android, of course you could periodically recycle the key, in which case even if they decompiled the android app and got the key it wouldn't work very long.
This is a lot of work and can't be implemented with html5 or hLS or event rtsp.
It also requires a custom server application that takes the video stream re-transmits it with the custom protocol.
On the other hand the protocol was transport agnostic, which meant we could use a variety of transports, tcp, IAP and bluetooth. Also would work on all mobile / desktop platforms.
The other little requirement, is couldn't use a browser, have to be a custom app.
I'm trying to figure out if HTMl5 is suited for the client part of an online conference system.
The client must be capable to:
1. display live video provided by the server, using the video tag.
2. Similar for the live audio, using audio tag.
3. The system supports text messaging also. Here we can use websockets
4. There is also a desktop sharing feature. For this kind of data stream I was also thinking to websockets. But this is binary data, it can be encoded in base64 before sending. So in the html5 client, it has to be decoded, processed (it is a proprietary protocol) and using a canvas object (?!) draw it to the screen.
Can the webapp process this amount of data in the same time ?
Is it HTML5 prepared for this ?
Can webapps process this amount off data? Yes
Is HTML5 prepared for this? Not yet, but soon
These are all areas that HTML5 is working to address. However, some of the working groups are farther along than others and the features have differing levels of implementation in browsers. Ericsson is doing a lot in this area. They have a patched version of webkit that enables enough of these technologies to do usable video/audio conferencing.
In terms of desktop sharing, noVNC (VNC client in a browser) demonstrates that this is possible. noVNC (disclaimer: I wrote noVNC) does full RFB/VNC decode and render in the browser using Javascript and Canvas. It uses WebSockets to send and receive the data and base64 encode/decodes over the wire since WebSockets doesn't support binary data yet. It uses a WebSockets to TCP proxy websockify to communicate with the VNC servers. It performs quite well.
Here are linked so some of the relevant standards work:
HTML5 index
Full web-apps standard
Canvas
video and audio tags
Media capture
Media capture API
Device tag/element
WebSockets API
Current WebSockets protocol in Chrome/Safari
All WebSockets protocol drafts
ArrayBuffer/Typed Arrays
stream API
File API
The best place to see what the status of various HTML5 related technologies is: http://caniuse.com
you may want to check out the work being done by the Ericsson labs:
https://labs.ericsson.com/developer-community/blog/beyond-html5-implementing-device-and-stream-management-webkit
also look at the index page for the new device API:
https://labs.ericsson.com/developer-community?type=blog