How to stream webcam to server and manipulate the stream - html

I'd like to stream a user's webcam (from the browser) to a server and I need the server to be able to manipulate the stream (run some C algorithms on that video stream) and send the user back information.
I have heavily looked at WebRTC and MediaCapture and read the examples here : https://bitbucket.org/webrtc/codelab/overview .
However this is made for peer-to-peer video chat. From what I have understood, the MediaStream from getUserMedia is transmitted via a RTCPeerConnection (with addStream) ; what I'd like to know is : can I use this, but process the video stream on the server ?
Thanks in advance for your help

Here is the solution I have designed.
I post here for people seeking the same kind of information :-)
Front End side
I use the WebRTC API : get webcam stream with getUserMedia, open RTCPeerConnection (and RTCDataChannel for downside information).
The stream is DTLS encrypted (mandatory), multimedia streams use RTP and RTCP. The video is VP8 encoded and the audio in Opus encoded.
Back End side
On the backend, this is the complex part.
The best (yet) alternative I could find is the Janus Gateway. It takes cares of a lot of stuff, like DTLS handshake, RTP / RTCP demuxing, etc. Basically, it fires a event each time a RTP packet is transmitted. (RTP packets are typically the size of the MTU, so there is not a 1:1 mapping between video frames and RTP packets).
I then built a GStreamer (version 1.0) to depacketize the RTP packets, decode the VP8, ensure video scaling and colorspace / format conversion to issue a BGR matrix (compatible with OpenCV). There is an AppSrc component at the beginning of the pipeline and a AppSink at the end.
What's left to do
I have to take extra measures to ensure good scalability (threads, memory leaks, etc) and find a clean and efficient way of using the C++ library I have inside this program.
Hope this helps !

Related

Current best practice to stream live video in web browser?

We develop an IP camera product which streams H.264/MPEG4/MJPEG video via RTSP/UDP. It has a web interface, currently we use the VLC Firefox plugin to allow viewing of the live RTSP stream in the browser but Firefox are dropping support for NPAPI plugins so that's currently a dead end.
The camera itself is a relatively low-powered ARM SoC (think Raspberry Pi level) so we don't have vast spare resource to do things like transcode streams on-the-fly on the board.
The main purpose is to check the video stream is working correctly from the web interface, so streaming a new stream (or transcoding it) in some other format/transport/streaming engine is less desirable than being able to somehow play the original RTSP stream directly. In regular use the video is streamed via RTSP into a VMS server so that's not up for alteration.
In an ideal world the solution would be open-source cross-browser and happen inside an HTML5 tag, but if it works in one or more of the most popular browsers we'll take it.
I've been reading all sorts of stuff here and around the web about the brave new world of the HTML5 video tag, WebRTC, HLS, etc. and have yet to see anything that looks like a sensible and complete solution that doesn't involve some extra conversion/transcoding/re-streaming, often by some half-supported framework or an extra server in the middle which is not a viable solution.
I haven't yet found a proper description of what may or may not be required to "convert" our stream to whatever-html5-video-likes, whether it's just a slightly different wrapper around the same basic video stream or if there's a lot of overhead and everything is different. Likewise it's not clear if the conversion could be achieved either on-board or perhaps even in-browser using JS.
The reason for the title is that if we've got to change the way it all works we may as well aim to do whatever is considered "best practice" and reasonably future-proof as far as possible rather than some expedient fudge that might not work beyond the next round of browser updates / the next W3C press release...
I find it slightly disappointing (but perhaps not surprising) that in 2017 there seems to be no sensible way of achieving this.
Perhaps "least worst practice" would be more suitable terminology...
There are many methods you can use that don't require transcoding.
WebRTC
If you're using RTSP, you're much of the way there in sending your streams via WebRTC.
WebRTC uses SDP for declaring streams, and RTP for the transport of these streams. There are some other layers you need for setting up the WebRTC call, but none of these require particularly expensive computation. Most (all?) WebRTC clients will support H.264 decoding, many with hardware acceleration in-browser.
The easiest way to get started with WebRTC is to implement a browser-to-browser client first. Then, you can go a layer deeper with your own implementation.
WebRTC is the route I recommend to you. NAT traversal (in most cases) and P2P connectivity are built-in, so your customers won't have to remember IP addresses. Simply provide signalling services and your customers can connect directly to their cameras at home from wherever. Provide TURN servers, and they'll be able to connect even if both ends are firewalled. If you don't wish to provide such services, they're lightweight and can run directly on the camera in a mode like you have today.
Fragmented MP4 over HTTP Progressive with <video> tag
This method is much simpler than WebRTC, but totally different than what you're doing now. You can take your H.264 stream, and wrap it directly in an MP4 without transcoding. Then, it can be played in a <video> tag on a page. You'll have to implement the appropriate libs in your code, but here's an FFmpeg example that outputs to STDOUT, which you'd pipe to clients:
ffmpeg \
-i YOUR_CAMERA_HERE \
-vcodec copy \
-acodec copy \
-f mp4 \
-movflags frag_keyframe+empty_moov \
-
Others...
In your case, there's no added benefit to DASH. DASH is intended for utilizing file-based CDNs for streaming. You control the server, so there's no point in writing out files or handling HTTP requests in a file-like manner. While you can certainly use DASH with H.264 streams without transcoding, I think it's a waste of your time.
HLS is much the same. Your stream is compatible with HLS, but HLS is dropping out of favor rapidly due to its lack of flexibility on codec. DASH and HLS are essentially the same mechanism... write a bunch of media segments to a CDN and create a playlist or manifest indicating where they are.
Well, I had to do the same thing while back in a raspberry pi 3. we transcoded it on the fly using ffmpeg on the pi and used https://github.com/phoboslab/jsmpeg to stream mjpeg. then played it on the browser/ionic app.
var canvas = document.getElementById('video-canvas');
this.player = new JSMpeg.Player(this.button.url ,{canvas: canvas});
We were managing up to 4 concurrent streams with minimum delay <2-5 secs on our Pis.
But once we moved to React Native we used the RN VLC wrapper on the phones

Chrome native messaging: can I stream a MediaStream to a native program?

I am writing a web application which needs to show a native window in the host window system. That window must display a video which is being streamed to the web application.
I have written a native program for OS X which displays a video in the way I need, and in the web application I have a MediaStream being sent via WebRTC. I need to connect these together.
I would like to use Chrome's native messaging, which lets me stream JSON objects to a native program. If I can access the raw data stream from the MediaStream, I should be able to transform this into JSON objects, stream those to the native application, where I can reconstruct the raw video stream.
Is something like this possible?
If possible, I strongly recommend to implement a WebRTC media server in your native application and directly communicate between the browser's WebRTC APIs and your server. Anything else has much more overhead.
For example, to go from MediaSource to native messaging, you need a way to serialize the audio and video feed in MediaSource to a sequence of bytes, and then send it over the native messaging channel (which will be JSON-encoded by the browser and then JSON-decoded by your native app).
For audio, you could use audioContext.createMediaStreamSource to bridge from a MediaStream (from WebRTC) to an Audio node (in the Web Audio API), and then use offlineAudioCtx.startRendering to convert from an audio node to raw bytes.
For video, you could paint the video on a canvas and then continously use toDataURL or toBlob to get the underlying data to send it over the wire. (See "Taking still photos with WebRTC" on MDN for a tutorial on taking a single picture, this can be generalized to multiple frames)
This sounds very inefficient, and it probably is, so you'd better implement a WebRTC media server in your native app to get some reasonable performance.

Streaming adaptive audio on the web (low latency)

I am attempting to implement a streaming audio solution for the web. My requirements are these:
Relatively low latency (no more than 2 seconds).
Streaming in a compressed format (Ogg Vorbis/MP3) to save on bandwidth.
The stream is generated on the fly and is unique for each client.
To clarify the last point, my case does not fit the usual pattern of having a stream being generated somewhere and then broadcast to the clients using something like Shoutcast. The stream is dynamic and will adapt based on client input which I handle separately using regular http requests to the same server.
Initially I looked at streaming Vorbis/MP3 as http chunks for use with the html5 audio tag, but after some more research I found a lot of people who say that the audio tag has pretty high latency which disqualifies it for this project.
I also looked into Emscripten which would allow me to play audio using SDL2, but the prospect of decoding Vorbis and MP3 in the browser is not too appealing.
I am looking to implement the server in C++ (probably using the asynchronous facilities of boost.asio), and to have as small a codebase as possible for playback in the browser (the more the browser does implicitly the better). Can anyone recommend a solution?
P.S. I have no problem implementing streaming protocol support from scratch in C++ if there are no ready to use libraries that fit the bill.
You should look into Media Source Extension.
Introduction: http://en.wikipedia.org/wiki/Media_Source_Extensions
Specification: https://w3c.github.io/media-source/

Stream audio and video data in network in html5

How do I achieve streaming of audio and video data and pass it on the network. I gone through a good article Here, But did not get in depth. I want to have chat application in HTML5
There are mainly below question
How to stream the audio and video data
How to pass to particular IP address.
Get that data and pass to video and audio control
If you want to serve a stream, you need a server doing so, by either downloading and installing, or coding on your own.
Streams only work in one direction, there is no responding or "retrieve back". Streaming is almost the same as downloading, with slight differences, depending on the service and use case.
Most streams are downstreams, but there are also upstreams. Did you hear about BufferStreams in PHP, Java, whatever? It's basically the same: data -> direction -> cursor.
Streams work over many protocols, even via different network layers, for example:
network/subnet broadcast, peer 2 peer, HTTP, DLNA, even FTP streams, ...
The basic nature of a stream is nothing more than data beeing sent to an audience.
You need to decide:
which protocol do you want to use for streaming
which server software
which media / source / live or with selectable start/end
which clients
The most popular HTTP streaming server is Shoutcast by Nullsoft (Winamp).
There is also DLNA which afaik is not HTTP based.
To provide more information, you need to be more specific regarding your basic requirements and decisions.

Sending HD movie frames through sockets to Flash

I was wondering if someone has ever done something like this. I have a HD movie (or even 720p one) and I want to send it to a Flash client. I was thinking of using OpenCV in C++ for the decoding and sending part. I had even implemented some of this, but have problems with wrong packet size.
But my question is different, has anyone did anything similar to this? Can this give a chance for performance improvement? I have strong doubts about this, because I think the sending and decoding will be still difficult for the Flash machine. Looking forward to hearing some opinions from more experienced guys.
not a real answer, more like thoughts about your problem:
yes, you must encode HD images, sending 25 fps x 1.5mb over the net is a no-go.
gstreamer was build for exactly that purpose. complicated, maybe, but look at it anyway !
why write a program, when vlc can do all of this already ? (even headless/scripted!)
if there's audio to stream, too - forget opencv. it's a computer-vision lib, not build for your problem there
There are essentially two network protocols that are commonly used to send video from a server to a flash client, HTTP, and RTMP.
HTTP is a well-known standard, easily implemented because it is a plain-text protocol, that allows Flash Player to play on-demand video files, or do what is called pseudo-streaming.
RTMP is a proprietary protocol created by Adobe, that allows real-time streaming as well as video on demand, and can also transport structured binary data (the AMF format) to act as a remote procedure call protocol.
Although now documented, it is much more complicated to implement than HTTP, but there is an open-source library that implements this protocol, librtmp, found at http://rtmpdump.mplayerhq.hu/.
Please note that I have used librtmp with success, on the client side, to have a C program act as a Flash client to publish video on a FMS server. I have no experience of using it on the server side, I don't even know if it's possible at all.
In your case I certainly recommend using HTTP.
Now there is another problem to overcome, it is the fact that for video frames to be properly recognized, they must be embedded in a container that the Flash player can read.
Flash currently supports two container formats, FLV and F4V, the latter being a subset of the MPEG-4 container format.
Also, the video stream must be readable by Flash, and so it must be properly encoded into a format supported on the client-side, for example H.264, Sorensen, or VP6.
It is possible to directly send GIF, JPEG or PNG images as frames, as seen on page 8 of the official Flash Video Specification, but you must realize that in a HD resolution, this will be extremely inefficient, just imagine that at 25 FPS, a single image at 1920x1080 pixels in JPEG is much bigger than the equivalent H.264 frame.
So, in the end, my advice is: do not decode the video on the server, make sure it is in a format compatible with Flash, and use a well-documented protocol to send it as-is.