I want to automate a transcoding workflow to h.264 in the adaptive streaming containers for HLS and Microsoft Smooth Streaming and wonder what my options are.
Ideally, there's Expression Encoder Pro with the Expression SDK that I could use to do just that. However, Expression Encoder pro is no longer for sale and the non-pro version can't do h.264.
There are other h.264 encoders, in particular with x264 there's an encoder proper that's gpl-licensed. x264 really just gives a pure stream output without the container though, let alone the adaptive streaming containers I need.
I found one reasonably priced encoder called Sorenson Squeeze that appears to have all I need (and in fact can use x264 for that part of the job), but I wonder if I have other options that make more sense in terms of spending money on licenses.
I already have licenses for Adobe's Media Encoder through Creative Cloud subscriptions, but Media Encoder can't work from the command line and I don't see any support for adaptive streaming with my desired containers.
Does anybody have more ideas?
FFmpeg and/or libav can transcode to h264 and support Smooth Streaming and HLS, and run on the command line. There's a bit of a learning curve (you in practice need to have an understanding of the container formats used, GOP and fragmentation/segmentation) but they do have the features you need.
If your media is on your local machine, and you have small amounts, buying one of the tools you mentioned might be your best bet.
However, if you have lots of media and you store it on the cloud, look at cloud offerings such as Amazon Elastic Transcoder or encoding.com.
That way you get out of the box support for formats like HLS, and you don't need to worry about licensing. it is all included in their pricing which is "per use". No subscription or upfront costs.
For e.g. MPEG-DASH adaptive bitrate content you can use either tools such as x264 + MP4Box, or cloud-services like bitcodin.
Related
I am trying to build a web application which will need to have audio streaming functionality implemented in some way. Just to give you guys some context: It is designed to be a purely auditive experience/game/idkhowtocallit with lots of different sound assets varying in length and thus file size. The sound assets to be provided will consist of ambient sounds, spoken bits of conversation, but also long music sets (up to a couple of hours). Why I think I won't be able to just host these audio files on some server or CDN and serve them from there is, because the sound assets will need to be fetched and played dynamically (depending on user interaction) and as instantly as possible.
Most importantly, consuming larger files (like the music sets and long ambient loops) as a whole doesn't seem to be client-friendly at all to me (used data consumption on mobile networks and client-side memory usage).
Also, without any buffering or streaming mechanism, the client won't be able to start playing these files before they are downloaded completely, right? Which would add the issue of high latencies.
I've tried to do some online research on how to properly implement a good infrastructure to stream bigger audio files to clients on the server side and found HLS and MPEG-DASH. I have some experience with consuming HLS players with web players and if I understand it correctly, I would use some sort of one-time transformation process (on or after file upload) to split up the files into chunks and create the playlist and then just serve these files via HTTP. From what I understand the process should be more or less the same for MPEG-DASH. My issue with these two techniques is that I couldn't really find any documentation on how to implement JavaScript/TypeScript clients (particularly using the Web Audio API) without reinventing the wheel. My best guess would be to use something like hls.js and bind the HLS streams to freshly created audio elements and use these elements to create AudioSources in my Web Audio Graph. How far off am I? I'm trying to get at least an idea of a best practice.
To sum up what I would really appreciate to get some clarity about:
Would HLS or MPEG-DASH really be the way to go or am I missing a more basic chunked file streaming mechanism with good libraries?
How - theoretically - would I go about limiting the amount of chunks downloaded in advance on the client side to save client-side resources, which is one of my biggest concerns?
I was looking into hosting services as well, but figured that most of them are specialized in hosting podcasts (fewer but very large files). Has anyone an opinion about whether I could use these services to host and stream possibly 1000s of files with sizes ranging from very small to rather large?
Thank you so much in advance to everyone who will be bothered with helping me out. Really appreciate it.
Why I think I won't be able to just host these audio files on some server or CDN and serve them from there is, because the sound assets will need to be fetched and played dynamically (depending on user interaction) and as instantly as possible.
Your long running ambient sounds can stream, using a normal HTMLAudioElement. When you play them, there may be a little lag time before they start since they have to begin streaming, but note that the browser will generally prefetch the metadata and maybe even the beginning of the media data.
For short sounds where latency is critical (like one-shot user interaction sound effects), load those into buffers with the Web Audio API for playback. You won't be able to stream them, but they'll play as instantly as you can get.
Most importantly, consuming larger files (like the music sets and long ambient loops) as a whole doesn't seem to be client-friendly at all to me (used data consumption on mobile networks and client-side memory usage).
If you want to play the audio, you naturally have to download that audio. You can't play something you haven't loaded in some way. If you use an audio element, you won't be downloading much more than what is being played. And, that downloading is mostly going to occur on-demand.
Also, without any buffering or streaming mechanism, the client won't be able to start playing these files before they are downloaded completely, right? Which would add the issue of high latencies.
If you use an audio element, the browser takes care of all the buffering and what not for you. You don't have to worry about it.
I've tried to do some online research on how to properly implement a good infrastructure to stream bigger audio files to clients on the server side and found HLS and MPEG-DASH.
If you're only streaming a single bitrate (which for audio is usually fine) and you're not streaming live content, then there's no point to HLS or DASH here.
Would HLS or MPEG-DASH really be the way to go or am I missing a more basic chunked file streaming mechanism with good libraries?
The browser will make ranged HTTP requests to get the data it needs out of the regular static media file. You don't need to do anything special to stream it. Just make sure your server is configured to handle ranged requests... most any should be able to do this right out of the box.
How - theoretically - would I go about limiting the amount of chunks downloaded in advance on the client side to save client-side resources, which is one of my biggest concerns?
The browser does this for you if you use an audio element. Additionally, data saving settings and the detected connectivity speed may impact whether or not the browser pre-fetches. The point is, you don't have to worry about this. You'll only be using what you need.
Just make sure you're compressing your media as efficiently as you can for the required audio quality. Use a good codec like Opus or AAC.
I was looking into hosting services as well, but figured that most of them are specialized in hosting podcasts (fewer but very large files). Has anyone an opinion about whether I could use these services to host and stream possibly 1000s of files with sizes ranging from very small to rather large?
Most any regular HTTP CDN will work just fine.
One final note for you... beware of iOS and Safari. Thanks to Apple's restrictive policies, all browsers under iOS are effectively Safari. Safari is incapable of playing more than one audio element at a time. If you use the Web Audio API you have more flexibility, but the Web Audio API has no real provision for streaming. You can use a media element source node, but this breaks lock screen metadata and outright doesn't work on some older versions of iOS. TL;DR; Safari is all but useless for audio on the web, and Apple's business practices have broken any alternatives.
We develop an IP camera product which streams H.264/MPEG4/MJPEG video via RTSP/UDP. It has a web interface, currently we use the VLC Firefox plugin to allow viewing of the live RTSP stream in the browser but Firefox are dropping support for NPAPI plugins so that's currently a dead end.
The camera itself is a relatively low-powered ARM SoC (think Raspberry Pi level) so we don't have vast spare resource to do things like transcode streams on-the-fly on the board.
The main purpose is to check the video stream is working correctly from the web interface, so streaming a new stream (or transcoding it) in some other format/transport/streaming engine is less desirable than being able to somehow play the original RTSP stream directly. In regular use the video is streamed via RTSP into a VMS server so that's not up for alteration.
In an ideal world the solution would be open-source cross-browser and happen inside an HTML5 tag, but if it works in one or more of the most popular browsers we'll take it.
I've been reading all sorts of stuff here and around the web about the brave new world of the HTML5 video tag, WebRTC, HLS, etc. and have yet to see anything that looks like a sensible and complete solution that doesn't involve some extra conversion/transcoding/re-streaming, often by some half-supported framework or an extra server in the middle which is not a viable solution.
I haven't yet found a proper description of what may or may not be required to "convert" our stream to whatever-html5-video-likes, whether it's just a slightly different wrapper around the same basic video stream or if there's a lot of overhead and everything is different. Likewise it's not clear if the conversion could be achieved either on-board or perhaps even in-browser using JS.
The reason for the title is that if we've got to change the way it all works we may as well aim to do whatever is considered "best practice" and reasonably future-proof as far as possible rather than some expedient fudge that might not work beyond the next round of browser updates / the next W3C press release...
I find it slightly disappointing (but perhaps not surprising) that in 2017 there seems to be no sensible way of achieving this.
Perhaps "least worst practice" would be more suitable terminology...
There are many methods you can use that don't require transcoding.
WebRTC
If you're using RTSP, you're much of the way there in sending your streams via WebRTC.
WebRTC uses SDP for declaring streams, and RTP for the transport of these streams. There are some other layers you need for setting up the WebRTC call, but none of these require particularly expensive computation. Most (all?) WebRTC clients will support H.264 decoding, many with hardware acceleration in-browser.
The easiest way to get started with WebRTC is to implement a browser-to-browser client first. Then, you can go a layer deeper with your own implementation.
WebRTC is the route I recommend to you. NAT traversal (in most cases) and P2P connectivity are built-in, so your customers won't have to remember IP addresses. Simply provide signalling services and your customers can connect directly to their cameras at home from wherever. Provide TURN servers, and they'll be able to connect even if both ends are firewalled. If you don't wish to provide such services, they're lightweight and can run directly on the camera in a mode like you have today.
Fragmented MP4 over HTTP Progressive with <video> tag
This method is much simpler than WebRTC, but totally different than what you're doing now. You can take your H.264 stream, and wrap it directly in an MP4 without transcoding. Then, it can be played in a <video> tag on a page. You'll have to implement the appropriate libs in your code, but here's an FFmpeg example that outputs to STDOUT, which you'd pipe to clients:
ffmpeg \
-i YOUR_CAMERA_HERE \
-vcodec copy \
-acodec copy \
-f mp4 \
-movflags frag_keyframe+empty_moov \
-
Others...
In your case, there's no added benefit to DASH. DASH is intended for utilizing file-based CDNs for streaming. You control the server, so there's no point in writing out files or handling HTTP requests in a file-like manner. While you can certainly use DASH with H.264 streams without transcoding, I think it's a waste of your time.
HLS is much the same. Your stream is compatible with HLS, but HLS is dropping out of favor rapidly due to its lack of flexibility on codec. DASH and HLS are essentially the same mechanism... write a bunch of media segments to a CDN and create a playlist or manifest indicating where they are.
Well, I had to do the same thing while back in a raspberry pi 3. we transcoded it on the fly using ffmpeg on the pi and used https://github.com/phoboslab/jsmpeg to stream mjpeg. then played it on the browser/ionic app.
var canvas = document.getElementById('video-canvas');
this.player = new JSMpeg.Player(this.button.url ,{canvas: canvas});
We were managing up to 4 concurrent streams with minimum delay <2-5 secs on our Pis.
But once we moved to React Native we used the RN VLC wrapper on the phones
we were using the vlc plugin in Chrome to play a multicast stream (RTP Ipv6) but with the deprecation of NPAPI-Plugins we need an alternative. I was trying to search something about html5 video but nothing.
NPAPI deprecation: developer guide
Any idea?
Thanks
RTP directly to the browser is not a solution I'd use today. The implementation effort to transform a number of RTP packets to Media Segments accepted by the Media Source Extension (MSE) is rather high and perhaps it's not even doable on all browsers (chrome.sockets seems to be a way to do it at least on Chrome browsers). Plugin development for more than a single browser is a nasty business as well. Don't go there!
I am not sure if it fits your requirements but here is what I'd do:
I would setup a process that converts RTP packets to MPEG-DASH packets on a server. Coincidentally I implemented a solution like that. You can find it on Github as RTP2DASH. The example receives multiple qualities of the same stream from ffmpeg but you don't need that - a single video stream from any RTP source should be enough as you can run MPEG-DASH with just a single video stream. Doing DASH seems like a big overhead in the beginning but the advantage is that there are players working on all browsers such as the DASH-IF Reference Player (I wouldn't use that one) or Google's Shaka Player (which is included in my example) already there.
I am attempting to implement a streaming audio solution for the web. My requirements are these:
Relatively low latency (no more than 2 seconds).
Streaming in a compressed format (Ogg Vorbis/MP3) to save on bandwidth.
The stream is generated on the fly and is unique for each client.
To clarify the last point, my case does not fit the usual pattern of having a stream being generated somewhere and then broadcast to the clients using something like Shoutcast. The stream is dynamic and will adapt based on client input which I handle separately using regular http requests to the same server.
Initially I looked at streaming Vorbis/MP3 as http chunks for use with the html5 audio tag, but after some more research I found a lot of people who say that the audio tag has pretty high latency which disqualifies it for this project.
I also looked into Emscripten which would allow me to play audio using SDL2, but the prospect of decoding Vorbis and MP3 in the browser is not too appealing.
I am looking to implement the server in C++ (probably using the asynchronous facilities of boost.asio), and to have as small a codebase as possible for playback in the browser (the more the browser does implicitly the better). Can anyone recommend a solution?
P.S. I have no problem implementing streaming protocol support from scratch in C++ if there are no ready to use libraries that fit the bill.
You should look into Media Source Extension.
Introduction: http://en.wikipedia.org/wiki/Media_Source_Extensions
Specification: https://w3c.github.io/media-source/
I was wondering if someone has ever done something like this. I have a HD movie (or even 720p one) and I want to send it to a Flash client. I was thinking of using OpenCV in C++ for the decoding and sending part. I had even implemented some of this, but have problems with wrong packet size.
But my question is different, has anyone did anything similar to this? Can this give a chance for performance improvement? I have strong doubts about this, because I think the sending and decoding will be still difficult for the Flash machine. Looking forward to hearing some opinions from more experienced guys.
not a real answer, more like thoughts about your problem:
yes, you must encode HD images, sending 25 fps x 1.5mb over the net is a no-go.
gstreamer was build for exactly that purpose. complicated, maybe, but look at it anyway !
why write a program, when vlc can do all of this already ? (even headless/scripted!)
if there's audio to stream, too - forget opencv. it's a computer-vision lib, not build for your problem there
There are essentially two network protocols that are commonly used to send video from a server to a flash client, HTTP, and RTMP.
HTTP is a well-known standard, easily implemented because it is a plain-text protocol, that allows Flash Player to play on-demand video files, or do what is called pseudo-streaming.
RTMP is a proprietary protocol created by Adobe, that allows real-time streaming as well as video on demand, and can also transport structured binary data (the AMF format) to act as a remote procedure call protocol.
Although now documented, it is much more complicated to implement than HTTP, but there is an open-source library that implements this protocol, librtmp, found at http://rtmpdump.mplayerhq.hu/.
Please note that I have used librtmp with success, on the client side, to have a C program act as a Flash client to publish video on a FMS server. I have no experience of using it on the server side, I don't even know if it's possible at all.
In your case I certainly recommend using HTTP.
Now there is another problem to overcome, it is the fact that for video frames to be properly recognized, they must be embedded in a container that the Flash player can read.
Flash currently supports two container formats, FLV and F4V, the latter being a subset of the MPEG-4 container format.
Also, the video stream must be readable by Flash, and so it must be properly encoded into a format supported on the client-side, for example H.264, Sorensen, or VP6.
It is possible to directly send GIF, JPEG or PNG images as frames, as seen on page 8 of the official Flash Video Specification, but you must realize that in a HD resolution, this will be extremely inefficient, just imagine that at 25 FPS, a single image at 1920x1080 pixels in JPEG is much bigger than the equivalent H.264 frame.
So, in the end, my advice is: do not decode the video on the server, make sure it is in a format compatible with Flash, and use a well-documented protocol to send it as-is.