Sending and Receiving PCM Samples - html

I've been working on making a near real-time voice chat app. The webpage will send packets to the server, and the server will save the packets to disk, and then re transmit the packets to the other connected webpages. I've tried many other solutions, but they are either laggy or they do not play. I've realized that sending PCM samples would be optimal (the server will be recording these as well), but I'm not sure how to get them to play on another client's end. I'm using NodeJS with Socket.IO. Thanks in advance!

The webpage will send packets to the server, and the server will save the packets to disk, and then re transmit the packets to the other connected webpages.
Already, this is not that efficient. It's better when possible to send data directly from peer to peer.
I've realized that sending PCM samples would be optimal
No, it wouldn't. This requires more bandwidth, which is going to need better buffering, which means higher latency. This is voice chat... no need to use a lossless encoding like PCM.
I've been working on making a near real-time voice chat app.
This is basically the defacto primary use case that WebRTC was built for. If you use WebRTC, you get:
Peer-to-peer streaming (where possible)
NAT traversal (to enable those P2P connections, where possible, or proxy them when not)
Low latency optimization, from end to end
Hardware acceleration (where available)
Opus audio codec
Automatic resampling, for compatibility and to keep latency low as things drop
In other words, this is already a solved problem with WebRTC.

Related

Video and audio stream - server to clients only

Is there a way to stream a video and audio on a website just to the clients, using a camera installed on the server - for instance, like youtube does ?
I've started reading webrtc, but if I use webrtc I should create a stun/turn server and other things, which for one way stream I think is not necessary (this is just my understanding of the things..) because I don't need anything from the clients, literally, neither their video, or audio..
So is there a way to achieve this using html5, streaming just in one direction:
server (camera) -> clients
Is there something about this out there, or should I stick with webrtc ?
I'm going to explain a possible solution for this scenario, there might be others, but I hope mine gives you a rough idea of how you could do it and a start point to explore more about the amazing possibilities of WebRTC. Please let me know if something is not understood.
So, WebRTC is a free, open project that provides browsers and mobile applications with Real-Time Communications (RTC) capabilities via simple APIs. Sweet, that is: WebRTC has a quite good browser support (not in every browser though, Safari just started supporting it a month ago with Safari 11). But in this case we want to use WebRTC in the server side. At the end of the day we can still think about peer-to-peer real time communication, where one of our peers is the server.
I don't know if you are familiar with Node.js, but I recommend you to write your Server app with it (<3 Javascript!):
There are a few libraries that wrap WebRTC functionality to be used in the server side, like node-webrtc and node-rtc-peer-connection.
But I recommend you to to take a look at electron-werbrtc, since
the others might be using deprecated methods or be incomplete.
electron-webrtc runs a headless Electron client in the background to
use Chromium's built-in WebRTC implementation. So with it you should
be able to access the Camera in your server and create a stream to be
served to the other peer (the browser).
All above would be the WebRTC related tasks, in this case: streaming video peer(server)-to-peer(browser).
Now, let's talk about signaling process, stun and turn.
Signaling: imagine now a scenario peer-to-peer with 2 browsers, they want to establish a direct connection and stream video and audio between each other. But they don't know each other, like if I don't know your home address, I can't send you a letter. So they need a service that helps them know each other, so they can have the other's IP. This should be done by what is called "a signaling server". If somehow you know the other peer IP, you wouldn't need a signaling server.
STUN/TURN: the scheme above works perfectly in a local area network where each peer has its own IP address and there are no firewalls and routers between them. But otherwise, you can have peers behind a NAT or firewalls, and then your signaling server won't be able to make both peers to discover themselves. If you have peers behind a NAT, you'll need a STUN server, and if you have peers behind firewalls you'll need a TURN server. This is a bit simplified, but I just want you to have the general picture of when you might need STUN/TURN servers.
To better understand Signaling, STUN and TURN, there is a very graphic article that explains them perfectly.
Now, for your scenario:
I think you prob don't need STUN/TURN servers and also you prob don't need to implement the signaling process, because the browsers that are supposed to receive the stream from the server will know that server address, right? So they can establish a WebRTC connection with it.
EDIT: it is likely that you will need to implement some sort of handshake between the server and the clients (browsers), so this will be the signaling process. This is not part of WebRTC and this is why you need to implement it yourself. As I said, it is the way 2 peers can discover each other, but they also exchange information as their local media conditions, like codecs, resolutions they can handle, etc. For your case, your signaling server could be hosted in the same server you use to strea: you can build a small node.js app that runs there and that manages all the signaling process easily, it is not a big deal. I recommend you to read this article, and specially the section "How can I build a signaling service?". In general all WebRTC articles from that site are very helpful.
Does this make sense to you? I think with it you can start digging a little bit more and see if with this is enough or you need to implement more stuff. Hope it helps!

Low latency (< 2s) live video streaming HTML5 solutions?

With Chrome disabling Flash by default very soon I need to start looking into flash/rtmp html5 replacement solutions.
Currently with Flash + RTMP I have a live video stream with < 1-2 second delay.
I've experimented with MPEG-DASH which seems to be the new industry standard for streaming but that came up short with 5 second delay being the best I could squeeze from it.
For context, I am trying to allow user's to control physical objects they can see on the stream, so anything above a couple of seconds of delay leads to a frustrating experience.
Are there any other techniques, or is there really no low latency html5 solutions for live streaming yet?
Technologies and Requirements
The only web-based technology set really geared toward low latency is WebRTC. It's built for video conferencing. Codecs are tuned for low latency over quality. Bitrates are usually variable, opting for a stable connection over quality.
However, you don't necessarily need this low latency optimization for all of your users. In fact, from what I can gather on your requirements, low latency for everyone will hurt the user experience. While your users in control of the robot definitely need low latency video so they can reasonably control it, the users not in control don't have this requirement and can instead opt for reliable higher quality video.
How to Set it Up
In-Control Users to Robot Connection
Users controlling the robot will load a page that utilizes some WebRTC components for connecting to the camera and control server. To facilitate WebRTC connections, you need some sort of STUN server. To get around NAT and other firewall restrictions, you may need a TURN server. Both of these are usually built into Node.js-based WebRTC frameworks.
The cam/control server will also need to connect via WebRTC. Honestly, the easiest way to do this is to make your controlling application somewhat web based. Since you're using Node.js already, check out NW.js or Electron. Both can take advantage of the WebRTC capabilities already built in WebKit, while still giving you the flexibility to do whatever you'd like with Node.js.
The in-control users and the cam/control server will make a peer-to-peer connection via WebRTC (or TURN server if required). From there, you'll want to open up a media channel as well as a data channel. The data side can be used to send your robot commands. The media channel will of course be used for the low latency video stream being sent back to the in-control users.
Again, it's important to note that the video that will be sent back will be optimized for latency, not quality. This sort of connection also ensures a fast response to your commands.
Video for Viewing Users
Users that are simply viewing the stream and not controlling the robot can use normal video distribution methods. It is actually very important for you to use an existing CDN and transcoding services, since you will have 10k-15k people watching the stream. With that many users, you're probably going to want your video in a couple different codecs, and certainly a whole array of bitrates. Distribution with DASH or HLS is easiest to work with at the moment, and frees you of Flash requirements.
You will probably also want to send your stream to social media services. This is another reason why it's important to start with a high quality HD stream. Those services will transcode your video again, reducing quality. If you start with good quality first, you'll end up with better quality in the end.
Metadata (chat, control signals, etc.)
It isn't clear from your requirements what sort of metadata you need, but for small message-based data, you can use a web socket library, such as Socket.IO. As you scale this up to a few instances, you can use pub/sub, such as Redis, to distribution messaging throughout the servers.
To synchronize the metadata to the video depends a bit on what's in that metadata and what the synchronization requirement is, specifically. Generally speaking, you can assume that there will be a reasonable but unpredictable delay between the source video and the clients. After all, you cannot control how long they will buffer. Each device is different, each connection variable. What you can assume is that playback will begin with the first segment the client downloads. In other words, if a client starts buffering a video and begins playing it 2 seconds later, the video is 2 seconds behind from when the first request was made.
Detecting when playback actually begins client-side is possible. Since the server knows the timestamp for which video was sent to the client, it can inform the client of its offset relative to the beginning of video playback. Since you'll probably be using DASH or HLS and you need to use MCE with AJAX to get the data anyway, you can use the response headers in the segment response to indicate the timestamp for the beginning the segment. The client can then synchronize itself. Let me break this down step-by-step:
Client starts receiving metadata messages from application server.
Client requests the first video segment from the CDN.
CDN server replies with video segment. In the response headers, the Date: header can indicate the exact date/time for the start of the segment.
Client reads the response Date: header (let's say 2016-06-01 20:31:00). Client continues buffering the segments.
Client starts buffering/playback as normal.
Playback starts. Client can detect this state change on the player and knows that 00:00:00 on the video player is actualy 2016-06-01 20:31:00.
Client displays metadata synchronized with the video, dropping any messages from previous times and buffering any for future times.
This should meet your needs and give you the flexibility to do whatever you need to with your video going forward.
Why not [magic-technology-here]?
When you choose low latency, you lose quality. Quality comes from available bandwidth. Bandwidth efficiency comes from being able to buffer and optimize entire sequences of images when encoding. If you wanted perfect quality (lossless for each image) you would need a ton (gigabites per viewer) of bandwidth. That's why we have these lossy codecs to begin with.
Since you don't actually need low latency for most of your viewers, it's better to optimize for quality for them.
For the 2 users out of 15,000 that do need low latency, we can optimize for low latency for them. They will get substandard video quality, but will be able to actively control a robot, which is awesome!
Always remember that the internet is a hostile place where nothing works quite as well as it should. System resources and bandwidth are constantly variable. That's actually why WebRTC auto-adjusts (as best as reasonable) to changing conditions.
Not all connections can keep up with low latency requirements. That's why every single low latency connection will experience drop-outs. The internet is packet-switched, not circuit-switched. There is no real dedicated bandwidth available.
Having a large buffer (a couple seconds) allows clients to survive momentary losses of connections. It's why CD players with anti-skip buffers were created, and sold very well. It's a far better user experience for those 15,000 users if the video works correctly. They don't have to know that they are 5-10 seconds behind the main stream, but they will definitely know if the video drops out every other second.
There are tradeoffs in every approach. I think what I have outlined here separates the concerns and gives you the best tradeoffs in each area. Please feel free to ask for clarification or ask follow-up questions in the comments.

Live video streaming webrtc?

So I want to be able to let one client distribute live video feed to alot of other clients only watching. Is it possible to use WebRTC for this? Or will I basically have to go through a service like ustream or whatever.
It is possible, with a caveat. One client can establish many outgoing P2P connections to every viewer and stream video to them directly. However, for more than a very small handful of viewers this will quickly saturate the origin's bandwidth and possibly CPU. You would not be able to serve many viewers this way; however this would work entirely without middleman.*
* Safe for the WebRTC connection negotiation server.
To be able to serve a larger number of viewers you'll have to use a centralised distribution server. The source would send exactly one video stream to that server, and the server would stream it to anyone who's interested. This still requires that server to have a beefy CPU and a lot of bandwidth; but this is more realistic to scale than the client side.
You may need to spend a lot of money on such a server; have a look at c3.xlarge and better instances at AWS to get an idea. Using an established, cheap infrastructure like ustream may indeed be the more realistic option.

Stream audio and video data in network in html5

How do I achieve streaming of audio and video data and pass it on the network. I gone through a good article Here, But did not get in depth. I want to have chat application in HTML5
There are mainly below question
How to stream the audio and video data
How to pass to particular IP address.
Get that data and pass to video and audio control
If you want to serve a stream, you need a server doing so, by either downloading and installing, or coding on your own.
Streams only work in one direction, there is no responding or "retrieve back". Streaming is almost the same as downloading, with slight differences, depending on the service and use case.
Most streams are downstreams, but there are also upstreams. Did you hear about BufferStreams in PHP, Java, whatever? It's basically the same: data -> direction -> cursor.
Streams work over many protocols, even via different network layers, for example:
network/subnet broadcast, peer 2 peer, HTTP, DLNA, even FTP streams, ...
The basic nature of a stream is nothing more than data beeing sent to an audience.
You need to decide:
which protocol do you want to use for streaming
which server software
which media / source / live or with selectable start/end
which clients
The most popular HTTP streaming server is Shoutcast by Nullsoft (Winamp).
There is also DLNA which afaik is not HTTP based.
To provide more information, you need to be more specific regarding your basic requirements and decisions.

When using WebRTC, is a peer-to-peer architecture redundant to build a video chat service like Skype?

We're playing around with WebRTC and trying to understand its benefits.
One reason Skype can serve hundreds of millions of people is because of its decentralized, peer-to-peer architecture, which keeps server costs down.
Does WebRTC allow people to build a video chat application similar to Skype in that the architecture can be decentralized (i.e., video streams are not routed from a broadcaster through a central server to listeners but rather routed directly from broadcaster to listener)?
Or, put another way, does WebRTC allow someone to essentially replicate the benefits of a P2P architecture similar to Skype's?
Or do you still need something similar to Skype's P2P architecture?
Yes, that's basically what WebRTC does. Calls using the getPeerConnection() API don't send voice/video data through a centralized server, but rather use firewall traversal protocols like ICE, STUN and TURN to allow a direct, peer-to-peer connection. However, the initial call setup still requires a server (most likely something running a WebSocket implementation, but it could be anything that you can figure out how to get JavaScript to talk to), so that the two clients can figure out that they're both online, signal that they want to connect, and then figure out how to do it (this is where the ICE/STUN/TURN bit comes in).
However, there's more to Skype's P2P architecture than just passing voice/video data back and forth. The majority of Skype's IP isn't in the codecs or protocols (much of which they licensed from Global IP Solutions, which Google purchased two years ago and then open-sourced, and which forms of the basis of Chrome's WebRTC implementation). Skype's real IP is all in the piece of WebRTC which still depends on a server: figuring out which people are online, and where they are, and how to get a hold of them, and doing that in a massively decentralized fashion. (See here for some rough details.) I think that you could probably use the DataStream portion of the getPeerConnection() API to do that sort of thing, if you were really, really smart - but it would be complicated, and would most likely stomp on a few Skype patents. Unless you want to be really, really huge, you'd probably just want to run your own centralized presence and location servers and handle all that stuff through standard WebSockets.
I should note that Skype's network architecture has changed since it was created; it no longer (from what I hear) uses random users as supernodes to relay data from client 1 to client 2; it didn't scale well and caused rampant variability in results (and annoyed people who had non-firewalled connections and bandwidth).
You definitely can build something SKype-like with WebRTC - and more. :-)