It's been a long time since I've needed to build a streaming media solution for the web since there are so many good services providing basic needs. Ages ago I had done streaming deployments with Red5 and a frontend player like Flow or JW, but despite a lot of searching I haven't been able to determine what the best modern options are to achieve a simple web solution that has good native compatibility with mobile and desktop. I'm hoping someone can suggest a good open source stack that would be a good fit for what I'm trying to do:
Mobile + Desktop Web Audio Support (with the possibility of some video at some point)
Streaming (not live on-air streaming, but the ability to seek and listen without downloading the whole file and only buffering enough to for good quality as to not waste bandwidth)
Track how much of a stream has been listened to (not the amount downloaded)
Protecting streams (this should be pretty easy either using headers or URL patterns that expire the link/connection after a certain point of time by writing that logic into the server side but I wanted to mention it anyways)
Lightweight, maybe 10 to 20 concurrent audio streams for now.
My initial thought, since my knowledge in the streaming space is dated, was to do it with RTMP, RTSP or HLS and something like JWPlayer for the frontend, depending on which protocol seemed to have the best support across the most devices. Or, if the client side support is good I could probably skip the media server and go with psuedo-streaming via the webserver. On the server side I could handle the access control to the streams and on the front end player I could use the API to keep track of the amount of seconds played and just ping that data in a ajax request to keep track of how much has actually been listened too (as opposed to downloaded/buffered).
The thing is I have a feeling that is probably an obsolete way to go about it in 2017. Even though I haven't turned up anything specific yet, I had a feeling perhaps there was a better solution out there utilizing Node and perhaps websockets to accomplish the same thing with a server and player coupled more tightly (server aware of playhead/actual amount played vs front end only) or a solution that leverages newer standards (MPEG-DASH?). Plus, in the past getting the other streaming protocols to be compatible across all devices was a pain and IIRC there is no universal HTML5 support for any given one of the older protocols across all browsers.
So what are the best current approaches to tackle something like this? Is a media server + a front end like JW or Flow still one of the better ways to go or are there better/easier ways to deploy a solution like this?
Related
I am trying to build a web application which will need to have audio streaming functionality implemented in some way. Just to give you guys some context: It is designed to be a purely auditive experience/game/idkhowtocallit with lots of different sound assets varying in length and thus file size. The sound assets to be provided will consist of ambient sounds, spoken bits of conversation, but also long music sets (up to a couple of hours). Why I think I won't be able to just host these audio files on some server or CDN and serve them from there is, because the sound assets will need to be fetched and played dynamically (depending on user interaction) and as instantly as possible.
Most importantly, consuming larger files (like the music sets and long ambient loops) as a whole doesn't seem to be client-friendly at all to me (used data consumption on mobile networks and client-side memory usage).
Also, without any buffering or streaming mechanism, the client won't be able to start playing these files before they are downloaded completely, right? Which would add the issue of high latencies.
I've tried to do some online research on how to properly implement a good infrastructure to stream bigger audio files to clients on the server side and found HLS and MPEG-DASH. I have some experience with consuming HLS players with web players and if I understand it correctly, I would use some sort of one-time transformation process (on or after file upload) to split up the files into chunks and create the playlist and then just serve these files via HTTP. From what I understand the process should be more or less the same for MPEG-DASH. My issue with these two techniques is that I couldn't really find any documentation on how to implement JavaScript/TypeScript clients (particularly using the Web Audio API) without reinventing the wheel. My best guess would be to use something like hls.js and bind the HLS streams to freshly created audio elements and use these elements to create AudioSources in my Web Audio Graph. How far off am I? I'm trying to get at least an idea of a best practice.
To sum up what I would really appreciate to get some clarity about:
Would HLS or MPEG-DASH really be the way to go or am I missing a more basic chunked file streaming mechanism with good libraries?
How - theoretically - would I go about limiting the amount of chunks downloaded in advance on the client side to save client-side resources, which is one of my biggest concerns?
I was looking into hosting services as well, but figured that most of them are specialized in hosting podcasts (fewer but very large files). Has anyone an opinion about whether I could use these services to host and stream possibly 1000s of files with sizes ranging from very small to rather large?
Thank you so much in advance to everyone who will be bothered with helping me out. Really appreciate it.
Why I think I won't be able to just host these audio files on some server or CDN and serve them from there is, because the sound assets will need to be fetched and played dynamically (depending on user interaction) and as instantly as possible.
Your long running ambient sounds can stream, using a normal HTMLAudioElement. When you play them, there may be a little lag time before they start since they have to begin streaming, but note that the browser will generally prefetch the metadata and maybe even the beginning of the media data.
For short sounds where latency is critical (like one-shot user interaction sound effects), load those into buffers with the Web Audio API for playback. You won't be able to stream them, but they'll play as instantly as you can get.
Most importantly, consuming larger files (like the music sets and long ambient loops) as a whole doesn't seem to be client-friendly at all to me (used data consumption on mobile networks and client-side memory usage).
If you want to play the audio, you naturally have to download that audio. You can't play something you haven't loaded in some way. If you use an audio element, you won't be downloading much more than what is being played. And, that downloading is mostly going to occur on-demand.
Also, without any buffering or streaming mechanism, the client won't be able to start playing these files before they are downloaded completely, right? Which would add the issue of high latencies.
If you use an audio element, the browser takes care of all the buffering and what not for you. You don't have to worry about it.
I've tried to do some online research on how to properly implement a good infrastructure to stream bigger audio files to clients on the server side and found HLS and MPEG-DASH.
If you're only streaming a single bitrate (which for audio is usually fine) and you're not streaming live content, then there's no point to HLS or DASH here.
Would HLS or MPEG-DASH really be the way to go or am I missing a more basic chunked file streaming mechanism with good libraries?
The browser will make ranged HTTP requests to get the data it needs out of the regular static media file. You don't need to do anything special to stream it. Just make sure your server is configured to handle ranged requests... most any should be able to do this right out of the box.
How - theoretically - would I go about limiting the amount of chunks downloaded in advance on the client side to save client-side resources, which is one of my biggest concerns?
The browser does this for you if you use an audio element. Additionally, data saving settings and the detected connectivity speed may impact whether or not the browser pre-fetches. The point is, you don't have to worry about this. You'll only be using what you need.
Just make sure you're compressing your media as efficiently as you can for the required audio quality. Use a good codec like Opus or AAC.
I was looking into hosting services as well, but figured that most of them are specialized in hosting podcasts (fewer but very large files). Has anyone an opinion about whether I could use these services to host and stream possibly 1000s of files with sizes ranging from very small to rather large?
Most any regular HTTP CDN will work just fine.
One final note for you... beware of iOS and Safari. Thanks to Apple's restrictive policies, all browsers under iOS are effectively Safari. Safari is incapable of playing more than one audio element at a time. If you use the Web Audio API you have more flexibility, but the Web Audio API has no real provision for streaming. You can use a media element source node, but this breaks lock screen metadata and outright doesn't work on some older versions of iOS. TL;DR; Safari is all but useless for audio on the web, and Apple's business practices have broken any alternatives.
I would like to implement video recording/playback/storage capability for my website. I'm done a bit of research, for HTML5 recording, there is RecordRTC which is based on WebRTC. For playback there's video.js. I want to be able to store them on s3 but I haven't figured out how.
1) Is this the best way to do it without paying for cloud based commercial ones such as ziggeo, nimbb and pipe?
2) are there any alternatives that i should look into?
3) how does storage work after recording using RecordRTC and uploading to s3? Do i need to do any sort of compression?
Any help would be great! Really appreciate it
Video recording is the future of all websites in our eyes - and by our I mean here at Ziggeo (full disclosure, I work at Ziggeo :) ).
In regards to recording there are many ways to do it and it is up to you to go with a specific one or implement all of them, so you could do it through Flash, WebRTC (https://webrtc.org/), or ORTC (https://ortc.org/).
We are currently offering you to record using WebRTC plus fallback with Flash and are working on implementing ORTC as well.
Now as mentioned above, there are many ways to do it and it is up to you, however it is up to your end users also since they might not be able to record over flash due to company policy or your website is on HTTP so you can not use WebRTC, etc.
With your own implementation you need to run the numbers and combine it all together (and work on keeping it up and running), while here at Ziggeo we do that for you and keep improving our SDKs and features.
Further more we also allow you to push the videos to S3 buckets, FTP, YouTube and Facebook - soon to DropBox as well.
So if you are like us, you will probably like to go down the road of do it yourself. If you however want to have time to work on your website, apps, and other things and just have the video, I do suggest using some service.
In regards to compression. It is good to mention that we do transcoding of all videos that are uploaded to our servers (You can see more here: https://ziggeo.com/features/transcoding). There is an original video that is kept and next to it the transcoded video (which can have watermark or some effects, etc. while it does not need to).
In general you want to 'standardize' the uploaded videos since different browsers will give you different video data containers and this would give you the upper hand so that it is easier to make adjustments to them later on for preview depending on the browser that is used.
To summarize:
1) - This depends on what kind of recording/playback and storage you need. If it is professional then using a service such as Ziggeo will help you focus on the important part of your service - like website design, functionality and similar, while if it is for fun and play you still have a free plan on Ziggeo, or you could get your sleeves up and do some codding :)
2) - I would personally look into WebRTC and ORTC if I was making implementation myself to see which one I would need more (or would be easier for me to implement). Once you find the one that you like, they usually offer some suggestions on their forums with what works best for them. (Be prepared however to need flash implementation at some point as well if it is business related setup)
3) It is best to standardize what you store in terms of resolution, video data containers and similar and often it is good to keep the original videos as well, so that you can always re-encode them if that is needed (which can happen in early stages of development).
With Chrome disabling Flash by default very soon I need to start looking into flash/rtmp html5 replacement solutions.
Currently with Flash + RTMP I have a live video stream with < 1-2 second delay.
I've experimented with MPEG-DASH which seems to be the new industry standard for streaming but that came up short with 5 second delay being the best I could squeeze from it.
For context, I am trying to allow user's to control physical objects they can see on the stream, so anything above a couple of seconds of delay leads to a frustrating experience.
Are there any other techniques, or is there really no low latency html5 solutions for live streaming yet?
Technologies and Requirements
The only web-based technology set really geared toward low latency is WebRTC. It's built for video conferencing. Codecs are tuned for low latency over quality. Bitrates are usually variable, opting for a stable connection over quality.
However, you don't necessarily need this low latency optimization for all of your users. In fact, from what I can gather on your requirements, low latency for everyone will hurt the user experience. While your users in control of the robot definitely need low latency video so they can reasonably control it, the users not in control don't have this requirement and can instead opt for reliable higher quality video.
How to Set it Up
In-Control Users to Robot Connection
Users controlling the robot will load a page that utilizes some WebRTC components for connecting to the camera and control server. To facilitate WebRTC connections, you need some sort of STUN server. To get around NAT and other firewall restrictions, you may need a TURN server. Both of these are usually built into Node.js-based WebRTC frameworks.
The cam/control server will also need to connect via WebRTC. Honestly, the easiest way to do this is to make your controlling application somewhat web based. Since you're using Node.js already, check out NW.js or Electron. Both can take advantage of the WebRTC capabilities already built in WebKit, while still giving you the flexibility to do whatever you'd like with Node.js.
The in-control users and the cam/control server will make a peer-to-peer connection via WebRTC (or TURN server if required). From there, you'll want to open up a media channel as well as a data channel. The data side can be used to send your robot commands. The media channel will of course be used for the low latency video stream being sent back to the in-control users.
Again, it's important to note that the video that will be sent back will be optimized for latency, not quality. This sort of connection also ensures a fast response to your commands.
Video for Viewing Users
Users that are simply viewing the stream and not controlling the robot can use normal video distribution methods. It is actually very important for you to use an existing CDN and transcoding services, since you will have 10k-15k people watching the stream. With that many users, you're probably going to want your video in a couple different codecs, and certainly a whole array of bitrates. Distribution with DASH or HLS is easiest to work with at the moment, and frees you of Flash requirements.
You will probably also want to send your stream to social media services. This is another reason why it's important to start with a high quality HD stream. Those services will transcode your video again, reducing quality. If you start with good quality first, you'll end up with better quality in the end.
Metadata (chat, control signals, etc.)
It isn't clear from your requirements what sort of metadata you need, but for small message-based data, you can use a web socket library, such as Socket.IO. As you scale this up to a few instances, you can use pub/sub, such as Redis, to distribution messaging throughout the servers.
To synchronize the metadata to the video depends a bit on what's in that metadata and what the synchronization requirement is, specifically. Generally speaking, you can assume that there will be a reasonable but unpredictable delay between the source video and the clients. After all, you cannot control how long they will buffer. Each device is different, each connection variable. What you can assume is that playback will begin with the first segment the client downloads. In other words, if a client starts buffering a video and begins playing it 2 seconds later, the video is 2 seconds behind from when the first request was made.
Detecting when playback actually begins client-side is possible. Since the server knows the timestamp for which video was sent to the client, it can inform the client of its offset relative to the beginning of video playback. Since you'll probably be using DASH or HLS and you need to use MCE with AJAX to get the data anyway, you can use the response headers in the segment response to indicate the timestamp for the beginning the segment. The client can then synchronize itself. Let me break this down step-by-step:
Client starts receiving metadata messages from application server.
Client requests the first video segment from the CDN.
CDN server replies with video segment. In the response headers, the Date: header can indicate the exact date/time for the start of the segment.
Client reads the response Date: header (let's say 2016-06-01 20:31:00). Client continues buffering the segments.
Client starts buffering/playback as normal.
Playback starts. Client can detect this state change on the player and knows that 00:00:00 on the video player is actualy 2016-06-01 20:31:00.
Client displays metadata synchronized with the video, dropping any messages from previous times and buffering any for future times.
This should meet your needs and give you the flexibility to do whatever you need to with your video going forward.
Why not [magic-technology-here]?
When you choose low latency, you lose quality. Quality comes from available bandwidth. Bandwidth efficiency comes from being able to buffer and optimize entire sequences of images when encoding. If you wanted perfect quality (lossless for each image) you would need a ton (gigabites per viewer) of bandwidth. That's why we have these lossy codecs to begin with.
Since you don't actually need low latency for most of your viewers, it's better to optimize for quality for them.
For the 2 users out of 15,000 that do need low latency, we can optimize for low latency for them. They will get substandard video quality, but will be able to actively control a robot, which is awesome!
Always remember that the internet is a hostile place where nothing works quite as well as it should. System resources and bandwidth are constantly variable. That's actually why WebRTC auto-adjusts (as best as reasonable) to changing conditions.
Not all connections can keep up with low latency requirements. That's why every single low latency connection will experience drop-outs. The internet is packet-switched, not circuit-switched. There is no real dedicated bandwidth available.
Having a large buffer (a couple seconds) allows clients to survive momentary losses of connections. It's why CD players with anti-skip buffers were created, and sold very well. It's a far better user experience for those 15,000 users if the video works correctly. They don't have to know that they are 5-10 seconds behind the main stream, but they will definitely know if the video drops out every other second.
There are tradeoffs in every approach. I think what I have outlined here separates the concerns and gives you the best tradeoffs in each area. Please feel free to ask for clarification or ask follow-up questions in the comments.
We've developed a professional WebRTC application and are trying to give users an indication of how many streams their PC can handle (2-7). Is there an easy way of figuring this out (in browser or with a separate application)?
It's a conference application we offer to users browsing with Chrome.
Another question, if you work with for example 7 streams, are they divided over the different CPU cores? Or is the whole WebRTC deal included in the process for that browser tab?
WebRTC makes extensive use of threads, so it can utilize more than one core especially in multi-party conferences.
The simplest way to check is to make calls to yourself (each one = 2 calls in a mesh conference). If it's an MCU-style conference (likely with 7 participants), you need to simulate a one-way call (so you're doing one encode), plus decode N additional VP8 streams at "appropriate" resolutions.
This is complicated by Firefox, for example, using content analysis to selectively reduce resolution and/or frame rate of sent-video depending on load and outgoing bandwidth. However, for your case, it's more one of reception.
The short answer, though, is that it's hard to be sure, and will depend on the other senders too.
I'm looking for a way to securely deliver video to mobile devices. There are two options:
HLS in tag. This works very nicely for iOS and supports adaptive bitrate, perfect for mobile. However, is seems to only work well on iOS. There seems to be only fragmented support for it on Android. I've read that Android has officially supported it since 3.0, but on all the android devices I've tested (>3.0), HLS hasn't played back on the browser.
Progressive download in tag. This will work on iOS and Android devices fine, but the concern is that since it's just a progressive download of the video, that the user find a way to just grab that video once the browser has finished downloading it. This may be more difficult on iOS, but I'm sure it's not that hard to figure out where the browser stored the video download in a tmp folder somewhere.
Either method I'd say can be protected from deeplinking by using an expiring token approach, where the token is generated serverside with a secret key that only the content server knows about. The video request would only be valid for 5 or 10 minutes, would would kill of deeplinking.
Is anyone aware of any way around these issues? Even if I was able to prevent deeplinking, the user could still get the video itself and re-distribute. Perhaps it's just not possible?
Thanks
Rule #1 of the internet:
If you don't want someone stealing it, don't put it online.
Welcome to the circumvention arms race. Brought to you by DownloadHelper.
There's nothing you can do to stop someone who really wants to pirate your video. There are various measures, like those you mention, that make it more difficult, but someone who really wants to copy it could find a way to capture it from memory, or even just point a camera at the screen and record the playback of the video.
It's the same way you protect your car. You install a steering lock, an alarm and an engine immobiliser, and then someone comes alongs and pulls the car onto a flat-bed truck and drives away with it.
Bottom line - you can't stop a determined thief, but you can make theft more difficult so that you're not the most attractive target.
As I was reading the above I could easily get pass all these techniques pretty quickly.
For a project I can't describe too much because of nda, we created our own protocol based on a well known encryption method can't mention that either , military grade) , encoded packets on the server to the protocol, and decoded on the device.
unfortunately this isn't perfect either because a lot of mobile apps can be re-versed engineered and once you get the key game over, very easy on android, of course you could periodically recycle the key, in which case even if they decompiled the android app and got the key it wouldn't work very long.
This is a lot of work and can't be implemented with html5 or hLS or event rtsp.
It also requires a custom server application that takes the video stream re-transmits it with the custom protocol.
On the other hand the protocol was transport agnostic, which meant we could use a variety of transports, tcp, IAP and bluetooth. Also would work on all mobile / desktop platforms.
The other little requirement, is couldn't use a browser, have to be a custom app.