How chrome://webrtc-internal measures the round trip time?

How chrome://webrtc-internal measures the round trip time? - google-chrome

I have been analyzing the JSON file generated using chrome://webrtc-internal, while running webrtc on 2 PCS.
I looked at Stats API to verify how webrtc-internal computes the round trip time (RTT).
I found 2 ways:
RTC Remote Inbound RTP Video Stream that contains roundTripTime
RTC IceCandidate Pair that contains currentRoundTripTime.
Which one is accurate, why, and how is it computed?
Is RTT computed on a frame-by-frame basis?
Is it computed one way (sender --> receiver), or two ways (sender --> receiver--> sender)?
Which reports are used to measure the RTT? Is it Receiver Report RTCP or Sender Report RTCP?
What is the size of the length of GOP in the Webrtc VP8 codec?

RTCIceCandidatePairStats.currentRoundTripTime is computed by how long it takes for the remote peer to respond to STUN Binding Request. The WebRTC ICE Agent sends these on an interval, and each messages has a TransactionID.
RTCRemoteInboundRtpStreamStats.currentRoundTripTime is computed by how long since the last SenderReport has been received. The sender knows when it sent, so it is able to compute how long it took to arrive.
They are both accurate. Personally I use the ICE stats since there is less overhead. The packet doesn't have to be decrypted and routed through the RTCP subsystem. IMO ICE is also easier to deal with then RTCP.
What is the size of the length of GOP in the Webrtc VP8 codec?. It depends on what is being encoded and the settings. Do you have a low keyframe interval? Are you encoding something with lots of changes? What are you trying to determine with this question?

Related

Webrtc behavior Nack & FEC

We have WebRTC application with two peers and I experience packet loss of around 5% (checked on webrtc-internals) when call is ongoing. I see Nacks as well.
Wants to know if FEC is being implemented in my setup? I do see some SDP parameters related to FEC as below but not sure whether they are used or not.
How to check if Webrtc is using FEC?
a=rtpmap:124 red/90000
a=rtpmap:123 ulpfec/90000
Also is there any suggestions on how to improve packet loss percentage by tweaking Nacks or FEC etc?
Tried with different bandwidth and resolutions and packet loss is almost same.

Easiest way to determine whether FEC is actually used is to run a packet capture using Wireshark or tcpdump and look for RTP packets where the payload type matches the value in the SDP (123 and 124 in your example). If you see these packets, you’re seeing FEC.
One thing to note, FEC could make packet loss worse in some cases, essentially where you have bursts of back to back packets lost because of congestion. FEC is transmitting additional packets, which allows any one or two packets in a group to be lost and recovered from the additional packets.

Found the root cause for packet loss. It was related to setup on network switches. We are using dedicated leaseline and leaseline expects fixed 100Mbps duplex configuration instead of auto configuration on network switch ports. Due to auto configuration, the link went in to half duplex and hence FEC errors.

Max bitrate value for Google chrome browser

I have a simple question.
What is the current maximum bitrate value supported by Google Chrome browser for web camera ?
For example, if I have a virtual source with high bitrate output (constant bitrate 50 Mbits)
Would I be able to get all 50 Mbits in my Chrome browser when using this device?
Thank you.

The camera's bitrate is irrelevant in this case, since WebRTC is going to encode that information using a video codec that compresses that information anyway.
What matters for WebRTC are 4 separate parameters:
The resolution supplied and the one the other end of the session is capable of receiving
The frame rate supplied and the one the other end of the session is capable of receiving
The network conditions - there's a limit enforced by the network and it is dynamic in nature, so WebRTC will try to estimate it at all times and accommodate to it
The maximum bitrate imposed by the participants
WebRTC in its nature will not limit the amount of bandwidth it takes and will try to use as much as it possibly can. That said, the actual bitrate used even without any limits will still depend on (1), (2) and the type of codec being used. It won't reach 50mbps...
For the most part, 2.5mbps will be enough for almost any type of content in WebRTC. 1080p will take up to 4mbps and 4K probably around 15mbps.

How should PLI packets be used in WebRTC video recording

We are using the licode MCU to streaming video from Google Chrome to the server and record it. The tricky part here is that there is only one Chrome browser involved so the server-side code has to handle sending feedback to the client.
We added server-side code to send REMB (bandwidth) packets every 5 seconds to the client. This causes the client to increase bitrate so that the video quality is good.
We did something similar with PLI packets to attempt to improve video quality. The recorded video had blocky artifacts and didn't look good. The current code sends a PLI every 0.8 seconds which causes the client to send a keyframe (full frame of video). This fixes the poor video quality because it forces a keyframe but when there is packet loss (wifi network) it quickly gets bad again.
My question is how should these PLI packets be used?
I think PLI means:
PLI - Picture Loss Indication

Your application should send at least three kinds of RTCP feedback:
an accurate receiver report (RFC 3550) every second or so, indicating packet loss and jitter rates to the sender; this will cause the sender to adapt their throughpout to the link characteristics;
a generic NACK (RFC 4585) whenever it misses a packet; this will avoid corruption by causing the sender to resend any packet that is lost;
a PLI (RFC 4585) whenever it hasn't seen a keyframe in a given interval, for example two seconds.
Sending REMB is only necessary to limit throughput if it grows too fast, for example if the feedback provided in receiver reports is inaccurate.

RTP fragmentation vs UDP fragmentation

I don't understand why we bother fragmenting at RTP level if UDP (or IP) layer does the fragmentation.
As I understand it, let's say we are on Ethernet link, the MTU is 1500 bytes.
If I have to send, for example, 3880 bytes, fragmenting at IP layer, would results in 3 packets of respectively 1500, 1500, and 940 bytes (IP header is 20 bytes, so the total overhead results in 60 bytes).
If I do it at UDP layer the overhead will be 84 bytes (3x 28 bytes).
At RTP layer it's 120 bytes of overhead.
At H264/NAL packetization layer, it's 3 more bytes (so 123 bytes final) for FU-A mode.
For such a small packet, it makes a final increase of 3.1% for the initial packet size, while at IP layer, it would only waste 1.5% overall.
Is there any valid reason to bother making such a complex packetization rules at RTP layer knowing it'd always be worse than lower layer fragmentation?

Except for the first fragment, fragmented IP traffic does not contain the source or destination port numbers. Instead it glues packets together using sequence IDs. This makes it impossible for stateless intermediate network devices (switches and routers) that need to re-install QoS (because .1p or DSCP flags were cleared by another device or never existed in the first place.) Unless the device has the resources to manage per-session state, it either has to risk rate-limiting/prioritizing fragments from unrelated streams, or not prioritizing any fragments, some of which can be voice/video.
AFAIK RTP packets never IP-fragment unless the network has MTU mismatches in it. Hence each UDP header has source and destination port numbers, so if you can tame your clients to use known port ranges, you can re-establish QoS markings based on this information, and you can pass IP fragments as vanilla traffic and not worry about dropping voice/video data.

RTP is designed with UDP in mind.
Applications typically run RTP on top of UDP to make use of its
multiplexing and checksum services; both protocols contribute parts of
the transport protocol functionality.
However RTP services that are added to raw UDP such as ability to detect packet reordering, losses and timing require that UDP data consists of RTP payload and also service information.
The Internet, like other packet networks, occasionally loses and
reorders packets and delays them by variable amounts of time. To cope
with these impairments, the RTP header contains timing information
and a sequence number that allow the receivers to reconstruct the
timing produced by the source, so that in this example, chunks of
audio are contiguously played out the speaker every 20 ms. This
timing reconstruction is performed separately for each source of RTP
packets in the conference. The sequence number can also be used by
the receiver to estimate how many packets are being lost.
Then RTP is designed to be extensible, common headers and data specific payload:
RTP is a protocol framework that is deliberately not complete. This document specifies those functions expected to be common across all the applications for which RTP would be appropriate. Unlike conventional protocols in which additional functions might be accommodated by making the protocol more general or by adding an option mechanism that would require
parsing, RTP is intended to be tailored through modifications and/or additions to the headers as needed.
All quotes are from RFC 1889 "RTP: A Transport Protocol for Real-Time Applications".
That is, RTP overhead for H.264 stream is not just a waste of bandwidth. RTP headers and H.264 payload formatting allow, at moderate cost, to handle video data streaming in a more reliable way, and in the same time to leverage specification which is well defined and good for different sorts of data.

I'd like to add that a lot of RTP servers/senders go about sending split datagrams inefficiently.
They use a lot of malloc/free in dynamic buffer contexts.
They also use one syscall per part of the message instead of message-vectors.
To add insult to injury they usually do a lot of time calculation / other handling between sending every part of the datagram.
This causes even more syscalls, sometimes even stretching the packet over a long time because they have no upper bound when the packet should be finished, only that it is finished before sending the next batch of packets.
Inefficient behavior like this gets seriously in the way if you want to scale throughput or on a low power embedded CPU. For bw, network and CPU efficiency reasons, it's usually way better to send the entire datagram in one go to the kernel and let it deal with fragmentation instead of userspace trying to figure it out.

Well, after a lot of thinking about this, there is no reason not to use IP based fragmentation up to 64kB (and this will happen if you have a lot of same timestamp's NAL unit you need to aggregate, via STAP-A for example).
The RFC6184 is clear, you can use up to 64kB of NAL this way since each NAL unit's size of exactly 2 bytes (16 bits) is appended before the actual NAL unit, although staying below the MTU is preferred.
What happen if the "single-time" NAL units cumulated size is larger than 64kB ? The RFC6184 does not say, but I guess you'll have to send all your NAL as separate FU-A packets without increasing the timestamp between them (this is where the only reason why the Start/End bit in the FU-A header is useful, since there is no more 1:1 match between the End bit and the RTP's marker bit).
The RFC states:
An aggregation packet can
carry as many aggregation units as necessary; however, the total
amount of data in an aggregation packet obviously MUST fit into an IP
packet, and the size SHOULD be chosen so that the resulting IP packet
is smaller than the MTU size
When a "single NAL per frame" is larger than the MTU (for example, 1460 bytes with Ethernet), it has to be split with a fragmentation unit packetization (for example, FU-A).
However, nothing in the RFC states that the limit should be 1460 bytes. And it makes sense to have larger than that when doing Ethernet only streaming (as computed above)
If you have a NAL unit larger than 64kB, then you must use FU-A to send it since you can not fit this in a single IP datagram.
The RFC states:
This payload type allows fragmenting a NAL unit into several RTP
packets. Doing so on the application layer instead of relying on
lower-layer fragmentation (e.g., by IP) has the following advantages:
o The payload format is capable of transporting NAL units bigger
than 64 kbytes over an IPv4 network that may be present in pre-
recorded video, particularly in High-Definition formats (there is
a limit of the number of slices per picture, which results in a
limit of NAL units per picture, which may result in big NAL
units).
o The fragmentation mechanism allows fragmenting a single NAL unit
and applying generic forward error correction as described in
Section 12.5.
Which I understand as: "If you NAL unit is less than 64kbytes, and you don't care about FEC, then don't use FU-A, but use a single RTP packet for it"
Another case where FU-A are necessary is when receiving a H264 stream with RTP over RTSP (interleaved mode). The "packet" size must fit in 2 bytes (16bits), so you also must fragment larger NAL unit even if send on a reliable stream socket.

TCP Slow Start, Congestion Avoidance & Determining Bandwidth

Is there a formula someplace which can be used to determine the minimum number of segments / bytes which need to be transfered across a TCP connection to determine it's bandwidth and which takes into account Slow Start and Congestion Avoidance? I'm aware of the pathrate tool, but I want if possible something a bit simpler that I can incorporate in an app to get a descent ballpark figure. One example of usage would be downloading some data from a webserver in order to determine the optimum number of threads for downloading a bunch of small files automatically. This is related to a previous question I posted: TCP, HTTP and the Multi-Threading Sweet Spot

You can fire up scholar.google.com and search for "TCP chirp". However, that requires hires timers, and if you don't write a kernel tcp congestion control algorithm, you'd have to reimplement TCP in userspace. And that by itself will probably not give good results (general purpose OS are not very good at realtime hires timer related stuff, runnning in userspace).
In theory, using TCP chirp you need as few as 4-5 segments (typically, you'd get better resolution with a longer train of segments) to determine the "optimal" bandwidth.
In any case, since you can not know which path is used (ie. satellite link or tv broadcast in the forward direction), you may need a considerable amount of data (10+ MB, perhaps even 1GB) to get a decent measurement over arbitrary paths. (Satellites can have many dozend MB/s bandwidth, but also latencies in the 1000-3000 ms range; and TCP takes a couple round-trip times to open up cwnd (I'd say around 10 RTTs before a measurement should be started)...

I do not think that there is a fixed number of bytes required to be sent to determine the bandwidth. This number can depend on network type and speed.
Bandwidth is a measure of some resource transferred over a time interval. To get real data you need to measure it. Here are some hints how to do that

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008