Understanding Websocket Frames in Chrome - google-chrome

When inspecting Websocket frames via Chromes' debug console, is the length field measuring the payload in bytes?
Obviously, it's the length of of the message. But, each character is one byte, right? If that is true, it's safe to say on my screenshot that 56, and 53 bytes were sent?

Yes, the length reported in Chrome is the length of the payload in bytes.
There is some additional overhead in the message itself beyond just what the payload length reports (both webSocket frame overhead and TCP/IP overhead, though it is fairly efficient in overhead). You can see the webSocket frame format here .
In your screenshot, 53 and 56 bytes of message payload were sent, but something a little larger than that went over the actual wire. You could count the characters in the data it reports was sent and that length should match the reported length. Keep in mind that TCP is a reliable protocol so there is extra TCP/IP protocol related to the reliable delivery of any packet, including ACKS sent back to confirm delivery, unique packet numbers, etc..., but that extra data is relatively small.

Related

How chrome://webrtc-internal measures the round trip time?

I have been analyzing the JSON file generated using chrome://webrtc-internal, while running webrtc on 2 PCS.
I looked at Stats API to verify how webrtc-internal computes the round trip time (RTT).
I found 2 ways:
RTC Remote Inbound RTP Video Stream that contains roundTripTime
RTC IceCandidate Pair that contains currentRoundTripTime.
Which one is accurate, why, and how is it computed?
Is RTT computed on a frame-by-frame basis?
Is it computed one way (sender --> receiver), or two ways (sender --> receiver--> sender)?
Which reports are used to measure the RTT? Is it Receiver Report RTCP or Sender Report RTCP?
What is the size of the length of GOP in the Webrtc VP8 codec?
RTCIceCandidatePairStats.currentRoundTripTime is computed by how long it takes for the remote peer to respond to STUN Binding Request. The WebRTC ICE Agent sends these on an interval, and each messages has a TransactionID.
RTCRemoteInboundRtpStreamStats.currentRoundTripTime is computed by how long since the last SenderReport has been received. The sender knows when it sent, so it is able to compute how long it took to arrive.
They are both accurate. Personally I use the ICE stats since there is less overhead. The packet doesn't have to be decrypted and routed through the RTCP subsystem. IMO ICE is also easier to deal with then RTCP.
What is the size of the length of GOP in the Webrtc VP8 codec?. It depends on what is being encoded and the settings. Do you have a low keyframe interval? Are you encoding something with lots of changes? What are you trying to determine with this question?

RTP fragmentation vs UDP fragmentation

I don't understand why we bother fragmenting at RTP level if UDP (or IP) layer does the fragmentation.
As I understand it, let's say we are on Ethernet link, the MTU is 1500 bytes.
If I have to send, for example, 3880 bytes, fragmenting at IP layer, would results in 3 packets of respectively 1500, 1500, and 940 bytes (IP header is 20 bytes, so the total overhead results in 60 bytes).
If I do it at UDP layer the overhead will be 84 bytes (3x 28 bytes).
At RTP layer it's 120 bytes of overhead.
At H264/NAL packetization layer, it's 3 more bytes (so 123 bytes final) for FU-A mode.
For such a small packet, it makes a final increase of 3.1% for the initial packet size, while at IP layer, it would only waste 1.5% overall.
Is there any valid reason to bother making such a complex packetization rules at RTP layer knowing it'd always be worse than lower layer fragmentation?
Except for the first fragment, fragmented IP traffic does not contain the source or destination port numbers. Instead it glues packets together using sequence IDs. This makes it impossible for stateless intermediate network devices (switches and routers) that need to re-install QoS (because .1p or DSCP flags were cleared by another device or never existed in the first place.) Unless the device has the resources to manage per-session state, it either has to risk rate-limiting/prioritizing fragments from unrelated streams, or not prioritizing any fragments, some of which can be voice/video.
AFAIK RTP packets never IP-fragment unless the network has MTU mismatches in it. Hence each UDP header has source and destination port numbers, so if you can tame your clients to use known port ranges, you can re-establish QoS markings based on this information, and you can pass IP fragments as vanilla traffic and not worry about dropping voice/video data.
RTP is designed with UDP in mind.
Applications typically run RTP on top of UDP to make use of its
multiplexing and checksum services; both protocols contribute parts of
the transport protocol functionality.
However RTP services that are added to raw UDP such as ability to detect packet reordering, losses and timing require that UDP data consists of RTP payload and also service information.
The Internet, like other packet networks, occasionally loses and
reorders packets and delays them by variable amounts of time. To cope
with these impairments, the RTP header contains timing information
and a sequence number that allow the receivers to reconstruct the
timing produced by the source, so that in this example, chunks of
audio are contiguously played out the speaker every 20 ms. This
timing reconstruction is performed separately for each source of RTP
packets in the conference. The sequence number can also be used by
the receiver to estimate how many packets are being lost.
Then RTP is designed to be extensible, common headers and data specific payload:
RTP is a protocol framework that is deliberately not complete. This document specifies those functions expected to be common across all the applications for which RTP would be appropriate. Unlike conventional protocols in which additional functions might be accommodated by making the protocol more general or by adding an option mechanism that would require
parsing, RTP is intended to be tailored through modifications and/or additions to the headers as needed.
All quotes are from RFC 1889 "RTP: A Transport Protocol for Real-Time Applications".
That is, RTP overhead for H.264 stream is not just a waste of bandwidth. RTP headers and H.264 payload formatting allow, at moderate cost, to handle video data streaming in a more reliable way, and in the same time to leverage specification which is well defined and good for different sorts of data.
I'd like to add that a lot of RTP servers/senders go about sending split datagrams inefficiently.
They use a lot of malloc/free in dynamic buffer contexts.
They also use one syscall per part of the message instead of message-vectors.
To add insult to injury they usually do a lot of time calculation / other handling between sending every part of the datagram.
This causes even more syscalls, sometimes even stretching the packet over a long time because they have no upper bound when the packet should be finished, only that it is finished before sending the next batch of packets.
Inefficient behavior like this gets seriously in the way if you want to scale throughput or on a low power embedded CPU. For bw, network and CPU efficiency reasons, it's usually way better to send the entire datagram in one go to the kernel and let it deal with fragmentation instead of userspace trying to figure it out.
Well, after a lot of thinking about this, there is no reason not to use IP based fragmentation up to 64kB (and this will happen if you have a lot of same timestamp's NAL unit you need to aggregate, via STAP-A for example).
The RFC6184 is clear, you can use up to 64kB of NAL this way since each NAL unit's size of exactly 2 bytes (16 bits) is appended before the actual NAL unit, although staying below the MTU is preferred.
What happen if the "single-time" NAL units cumulated size is larger than 64kB ? The RFC6184 does not say, but I guess you'll have to send all your NAL as separate FU-A packets without increasing the timestamp between them (this is where the only reason why the Start/End bit in the FU-A header is useful, since there is no more 1:1 match between the End bit and the RTP's marker bit).
The RFC states:
An aggregation packet can
carry as many aggregation units as necessary; however, the total
amount of data in an aggregation packet obviously MUST fit into an IP
packet, and the size SHOULD be chosen so that the resulting IP packet
is smaller than the MTU size
When a "single NAL per frame" is larger than the MTU (for example, 1460 bytes with Ethernet), it has to be split with a fragmentation unit packetization (for example, FU-A).
However, nothing in the RFC states that the limit should be 1460 bytes. And it makes sense to have larger than that when doing Ethernet only streaming (as computed above)
If you have a NAL unit larger than 64kB, then you must use FU-A to send it since you can not fit this in a single IP datagram.
The RFC states:
This payload type allows fragmenting a NAL unit into several RTP
packets. Doing so on the application layer instead of relying on
lower-layer fragmentation (e.g., by IP) has the following advantages:
o The payload format is capable of transporting NAL units bigger
than 64 kbytes over an IPv4 network that may be present in pre-
recorded video, particularly in High-Definition formats (there is
a limit of the number of slices per picture, which results in a
limit of NAL units per picture, which may result in big NAL
units).
o The fragmentation mechanism allows fragmenting a single NAL unit
and applying generic forward error correction as described in
Section 12.5.
Which I understand as: "If you NAL unit is less than 64kbytes, and you don't care about FEC, then don't use FU-A, but use a single RTP packet for it"
Another case where FU-A are necessary is when receiving a H264 stream with RTP over RTSP (interleaved mode). The "packet" size must fit in 2 bytes (16bits), so you also must fragment larger NAL unit even if send on a reliable stream socket.

Definition of Round Trip Time by using Ping ICMP messages

How is the RTT defined by the use of a "simple" ping command?
Example (Win7):
ping -l 600 www.google.de
My understanding is:
There will be send a ICMP message to google with the size of 600 bytes (request). Google copies that message (600 bytes) and sends it back to the destination (reply).
The RTT is the (latency) time for the whole procedure involving the sending and the getting of the 600 byte message.
Is that right?
Latency is typically caused by mainly two reasons:
1) Distance between two Nodes; This plays a vital role in calculating latency. For example, consider a scenario where Node A and Node B need to communicate, sending ICMP messages to each other and vice-versa.
a) The fewer the number of hops, the lower the latency will be. More hops, more latency.
Solution: You can select an alternate path for the communication, maybe the path having less distance.
2) How busy the network is; Whenever packet is sent from one network to other, routers process the packets, which in turn takes some milliseconds doing so. It will add up all the time taken to and fro for calculating the latency.
a) It depends upon the process device, how busy it is. If less busy, packets will be processed and forwarded faster, if busy it will take time.
Solution: one possible solution can be using QOS where in you can prioritize the traffic, not ICMP traffic of course, some other kind of traffic.

Does libpcap always make a copy of the packet?

I am writing monitoring program for a very high traffic network (HD videos are streamed through the network). Most packets are very large and I only want to watch the headers (IP and UDP/TCP only). Of course I want to avoid overhead of copying the entire data. Does libpcap necessarily give me a copy the whole packet? If yes, is there any library that matches my needs?
There appear to be two questions here:
the one in the title, which sounds as if it's asking whether libpcap copies the packet;
the one in the body, asking whether it always copies the entire packet.
For the first question:
There's probably at least one copy done by any code using the mechanisms atop which libpcap runs in various OSes - a copy from the mbufs/skbuff/STREAMS buffers/whatever to the mechanism's buffer. For Linux, when the tpacket mechanism is not being used, the skbuff might just be queued on the receive queue for the PF_PACKET socket libpcap is using.
There may be another copy - a copy from that buffer to userland; if libpcap is using a "zero-copy" mechanism, such as the Linux tpacket mechanism (which libpcap 1.0 and later use by default), the second copy doesn't happen. It will happen if a zero-copy mechanism isn't being used.
However, if you're using pcap_next() or pcap_next_ex() on a Linux system and the tpacket mechanism is being used, a separate copy, from the memory-mapped buffer to a private buffer; that doesn't happen if you use pcap_dispatch() or pcap_loop().
For the second question:
That's what the "snaplen" argument to pcap_open_live() and pcap_set_snaplen() is for - it lets you specify that no more than "snaplen" bytes of packet data should be captured, and that means that no more than that many bytes are copied.
Note that this length includes the link-layer headers, and that those can include "metadata" headers such as radiotap headers that you might get on 802.11 adapters. This header might be variable-length (for example, on 802.11, the 802.11 header is variable-length, and, if you're getting radiotap headers, those are variable-length as well).
In addition, both IPv4 and TCP headers can have options, and IPv6 packets can have extension headers, so the length of IP and TCP headers can also be variable.
This means that you might have to determine a "worst case" snapshot length to use; there's no way to explicitly say "don't give me anything past the TCP/UDP header", you can only say "give me no more than N bytes".

High "Receiving Time" for HTTP Responses below 500 bytes in Chrome Devtools

While using devtools Network tab on Chrome 15 (stable) on Windows 7 and
Windows XP, I am seeing cases where "receiving" time for an HTTP
response is >100ms but the response is a 302 redirects or small image
(beacons) - with a payload below 500 bytes (header+content).
Capturing the TCP traffic on Wireshark clearly shows the server sent
the entire HTTP response in a single TCP packet, so receiving time should
have been 0. A good example is CNN homepage, or any major website that has a lot of
ads and tracking beacons.
This brings up a couple of questions:
What is defined as "receiving" in chrome devtools? is this the time
from 1st packet to last packet?
What factors in the client machine/operating systems impact
"receiving" time, outside of the network/server communication?
In my tests I used a virtual machine for Windows XP, while Windows 7
was on a desktop (quad core, 8gb ram).
The "receiving time" is the time between the didReceiveResponse ("Response headers received") and didReceiveData ("A chunk of response data received") WebURLLoaderClient events reported by the network layer, so some internal processing overhead may apply.
In a general case, keep in mind that the HTTP protocol is stream-oriented, so the division of data between TCP packets is not predictable (half of your headers may get into one packet, the rest and the response body may get into the next one, though this does not seem to be your case.)
Whenever possible, use the latest version of Chrome available. It is likely to contain less errors, including the network layer :-)
The Nagle Algorithm and the Delayed ACK Algorithm are two congestion control algorithms that are enabled by default on Windows machines. These will introduce delays in the traffic of small payloads in an attempt to reduce some of the chattiness of TCP/IP.
Delayed ACK will cause ~200ms of additional "Receiving" time in Chrome's network tab when receiving small payloads. Here is a webpage explaining the algorithms and how to disable them on Windows: http://smallvoid.com/article/winnt-nagle-algorithm.html