How to resolve tcpdump dropped packets? - tcpdump

I am using tcpdump to capture network packets and running into issue when I start dropping packets. I ran an application which exchanges packets rapidly over network; resulting in high network bandwidth.
>> tcpdump -i eno1 -s 64 -B 919400
126716 packets captured
2821976 packets received by filter
167770 packets dropped by kernel
Since I am only interested in protocol related part from TCP packet; I want to collect TCP packets without data/payload. I hope this strategy can also help in capturing more packets before dropping packets. It appears that I can only increase buffer size (-B argument) upto certain limit. Even with higher limit I am dropping more packets than captured.
can you help me understanding above messages and questions I have
what are packets captured ?
what are packets received by filter?
what are packets dropped by kernel?
how can I capture all packets at high bandwidth without dropping any packets. My test application runs for 3 minutes and exchanges packets at a very high rate. I am only interested in protocol related information not in actual data/ payload being sent.

From Guy Harris himself:
the "packets captured" number is a number that's incremented every time tcpdump sees a packet, so it counts packets that tcpdump reads from libpcap and thus that libpcap reads from BPF and supplies to tcpdump.
The "packets received by filter" number is the "ps_recv" number from a call to pcap_stats(); with BPF, that's the bs_recv number from the BIOCGSTATS ioctl. That count includes all packets that were handed to BPF; those packets might still be in a buffer that hasn't yet been read by libpcap (and thus not handed to tcpdump), or might be in a buffer that's been read by libpcap but not yet handed to tcpdump, so it can count packets that aren't reported as "captured".
And from the tcpdump man page:
packets ``dropped by kernel'' (this is the number of packets that were dropped, due to a lack of buffer space, by the packet capture mechanism in the OS on which tcpdump is running, if the OS reports that information to applications; if not, it will be reported as 0).
To attempt to improve capture performance, here are a few things to try:
Don't capture in promiscuous mode if you don't need to. That will cut down on the amount of traffic that the kernel has to process. Do this by using the -p option.
Since you're only interested in TCP traffic, apply a capture expression that limits the traffic to TCP only. Do this by appending "tcp" to your command.
Try writing the packets to a file (or files to limit size) rather than displaying packets to the screen. Do this with the -w file option or look into the -C file_size and -G rotate_seconds options if you want to limit file sizes.
You could try to improve tcpdump's scheduling priority via nice.
From Wireshark's Performance wiki page:
stop other programs running on that machine, to remove system load
buy a bigger, faster machine :)
increase the buffer size (which you're already doing)
set a snap length (which you're already doing)
write capture files to a RAM disk
Try using PF_RING.
You could also try using dumpcap instead of tcpdump, although I would be surprised if the performance was drastically different.
You could try capturing with an external, dedicated device using a TAP or Switch+SPAN port. See Wireshark's Ethernet Capture Setup wiki page for ideas.
Another promising possibility: Capturing Packets in Linux at a Speed of Millions of Packets per Second without Using Third Party Libraries.
See also Andrew Brown's Sharkfest '14 Maximizing Packet Capture Performance document for still more ideas.
Good luck!

I would try actually lowering the value of your -B option.
The unit is 1 KiB (1024 bytes), thus the buffer size you specified (919400) is almost 1 gigabyte.
I suppose you would get better results by using a value closer to your CPU cache size, e.g. -B 16384.

Related

Does pcap_breakloop() flushes packets in the packet buffer before pcap_loop() returns?

I have a library which uses libpcap to capture packets. I'm using pcap_loop() in a dedicated thread for the capture and pcap_breakloop() to stop the capture.
The packet buffer timeout is set to 500ms.
In some rare cases I am missing the last packets that my application sends before calling pcap_breakloop().
Reading the libpcap documentation I ended up wondering if the packet loss is related to the packet buffer timeout. The documentation says:
packets are not delivered as soon as they arrive, but are delivered after a short delay (called a "packet buffer timeout")
What happens if pcap_breakloop() is called during this delay ? Are the packets in the buffer passed to the callback or are they dropped before pcap_loop() returns ?
I was unable to find the answer in the documentation.
Are the packets in the buffer passed to the callback
No.
or are they dropped before pcap_loop() returns ?
Yes. In capture mechanisms that buffer packets in kernel code and deliver them only when the buffer fills up or the timeout expires pcap_breakloop() doesn't force the packets to be delivered.
For some of those capture mechanisms there might be a way to force the timeout to, in effect, expire, but I don't know of any documented way to do that with Linux PF_PACKET sockets, BPF, or WinPcap/Npcap NPF.
Update, giving more details:
On Linux and Windows, pcap_breakloop() attempt to wake up anything that's blocked waiting for packets on the same pcap_t.
On Linux, this is implemented by having the poll() call in libpcap block on both the PF_PACKET socket being used for capturing and on an "event" descriptor; pcap_breakloop() causes the "event" descriptor to supply an event, so that the poll() wakes up even if there are no packets to pick up from the socket yet. That does not force the current chunk in the buffer (memory shared between the kernel and userland code) to be assigned to userland, so they're not provided to the caller of libpcap.
On Windows, with Npcap, an "event object" is used by the driver and Packet32 library (the libpcap part of Npcap calls routines in the Packet32 library) to allow the library to block waiting for packets and the driver to wake the library up when packets are available. pcap_breakloop() does a SetEvent() call on the handle for that object, which forces userland code waiting for packets to wake up; it then tries to read from the device. I'd have to spend more time looking at the driver code to see whether, if there are be buffered-but-not-delivered packets at that point, they will be delivered.
On all other platforms, pcap_breakloop() does not deliver a wakeup, as the capture mechanism either does no buffering or provides no mechanism to force a wakeup, so:
if no buffering is done, there's no packet buffer to flush;
if there's a timeout, code blocked on a read will be woken up when the timeout expires, and that buffer will be delivered to userland;
if there's no timeout, code blocked on a read could be blocked for an indefinite period of time.
The ideal situation would be if the capture mechanism provided, on all platforms that do buffering, a way for userland code to force the current buffer to be delivered, and thus to cause a wakeup. That would require changes to the NPF driver and Packet32 library in Npcap, and would require kernel changes in Linux, *BSD, macOS, Solaris, and AIX.
Update 2:
Note also that "break loop" means break out of the loop immediately, so even if all of the above were done, when the loop is exited, there might be packets remaining in libpcap's userland buffer. If you want those packets - even though, by calling pcap_breakloop(), you told libpcap "stop giving me packets" - you'll have put the pcap_t in non-blocking mode and call pcap_dispatch() to drain the userland buffer. (That won't drain the kernel buffer.)

libpcap: which platforms support packet buffer timeout via pcap_set_timeout()?

I'd like to have pcap_dispatch() timeout if no packets are received within a set period of time. Similar to this SO question.
In the pcap(3) manpage, it says that not all platforms support that:
Not all platforms support a packet buffer timeout; on platforms that
don't, the packet buffer timeout is ignored. A zero value for the
timeout, on platforms that support a packet buffer timeout, will cause
a read to wait forever to allow enough packets to arrive, with no
timeout. A negative value is invalid; the result of setting the
timeout to a negative value is unpredictable.
And in this post, user862787 said that "Some OSes time out even if
no packets have arrived, others don't"
It's considered platform-specific because it is, but it's not
considered buggy (trust me, I'm the person who wrote that text in the
man page) - the timeout is to keep pcap_dispatch() from waiting
forever for a packet buffer to fill, not to keep it from waiting
forever for any packets to arrive at all. Some OSes time out even if
no packets have arrived, others don't. – user862787 Oct 19 '12 at
20:53
So how do I know which platforms support and which don't? I've searched and gone through the libpcap source but didn't find anything.
Specifically, what about Centos 8.1, kernel 4.18.0-147.el8.x86_64, libpcap 1.10 ?
On systems using the BPF capture mechanism - *BSD, macOS, Solaris 11, AIX - the timeout will occur even if no packets have arrived.
On most versions of most Linux distributions, it won't.
Don't depend on it doing so or not doing so; write your code not to depend on that.
I've searched and gone through the libpcap source but didn't find anything.
You need to look at the source for the capture mechanism libpcap uses on particular platforms, not at the libpcap source.

Webrtc behavior Nack & FEC

We have WebRTC application with two peers and I experience packet loss of around 5% (checked on webrtc-internals) when call is ongoing. I see Nacks as well.
Wants to know if FEC is being implemented in my setup? I do see some SDP parameters related to FEC as below but not sure whether they are used or not.
How to check if Webrtc is using FEC?
a=rtpmap:124 red/90000
a=rtpmap:123 ulpfec/90000
Also is there any suggestions on how to improve packet loss percentage by tweaking Nacks or FEC etc?
Tried with different bandwidth and resolutions and packet loss is almost same.
Easiest way to determine whether FEC is actually used is to run a packet capture using Wireshark or tcpdump and look for RTP packets where the payload type matches the value in the SDP (123 and 124 in your example). If you see these packets, you’re seeing FEC.
One thing to note, FEC could make packet loss worse in some cases, essentially where you have bursts of back to back packets lost because of congestion. FEC is transmitting additional packets, which allows any one or two packets in a group to be lost and recovered from the additional packets.
Found the root cause for packet loss. It was related to setup on network switches. We are using dedicated leaseline and leaseline expects fixed 100Mbps duplex configuration instead of auto configuration on network switch ports. Due to auto configuration, the link went in to half duplex and hence FEC errors.

Definition of Round Trip Time by using Ping ICMP messages

How is the RTT defined by the use of a "simple" ping command?
Example (Win7):
ping -l 600 www.google.de
My understanding is:
There will be send a ICMP message to google with the size of 600 bytes (request). Google copies that message (600 bytes) and sends it back to the destination (reply).
The RTT is the (latency) time for the whole procedure involving the sending and the getting of the 600 byte message.
Is that right?
Latency is typically caused by mainly two reasons:
1) Distance between two Nodes; This plays a vital role in calculating latency. For example, consider a scenario where Node A and Node B need to communicate, sending ICMP messages to each other and vice-versa.
a) The fewer the number of hops, the lower the latency will be. More hops, more latency.
Solution: You can select an alternate path for the communication, maybe the path having less distance.
2) How busy the network is; Whenever packet is sent from one network to other, routers process the packets, which in turn takes some milliseconds doing so. It will add up all the time taken to and fro for calculating the latency.
a) It depends upon the process device, how busy it is. If less busy, packets will be processed and forwarded faster, if busy it will take time.
Solution: one possible solution can be using QOS where in you can prioritize the traffic, not ICMP traffic of course, some other kind of traffic.

Are there any protocols/standards on top of TCP optimized for high throughput and low latency?

Are there any protocols/standards that work over TCP that are optimized for high throughput and low latency?
The only one I can think of is FAST.
At the moment I have devised just a simple text-based protocol delimited by special characters. I'd like to adopt a protocol which is designed for fast transfer and supports perhaps compression and minification of the data that travels over the TCP socket.
Instead of using heavy-weight TCP, we can utilize the connection-oriented/reliable feature of TCP on the top of UDP by any of the following way:
UDP-based Data Transfer Protocol(UDT):
UDT is built on top of User Datagram Protocol (UDP) by adding congestion control and reliability control mechanisms. UDT is an application level, connection oriented, duplex protocol that supports both reliable data streaming and partial reliable messaging.
Acknowledgment:
UDT uses periodic acknowledgments (ACK) to confirm packet delivery, while negative ACKs (loss reports) are used to report packet loss. Periodic ACKs help to reduce control traffic on the reverse path when the data transfer speed is high, because in these situations, the number of ACKs is proportional to time, rather than the number of data packets.
Reliable User Datagram Protocol (RUDP):
It aims to provide a solution where UDP is too primitive because guaranteed-order packet delivery is desirable, but TCP adds too much complexity/overhead.
It extends UDP by adding the following additional features:
Acknowledgment of received packets
Windowing and congestion control
Retransmission of lost packets
Overbuffering (Faster than real-time streaming)
en.wikipedia.org/wiki/Reliable_User_Datagram_Protocol
If layered on top of TCP, you won't get better throughput or latency than the 'barest' TCP connection.
there are other non-TCP high-throughput and/or low-latency connection-oriented protocols, usually layered on top of UDP.
almost the only one i know is UDT, which is optimized for networks where the high bandwidth or long round trip times (RTT) makes typical TCP retransmissions suboptimal. These are called 'extremely long fat networks' (LFN, pronounced 'elefan').
You may want to consider JMS. JMS can run on top of TCP, and you can get reasonable latency with a message broker like ActiveMQ.
It really depends on your target audience though. If your building a game which must run anywhere, you pretty much need to use HTTP or HTTP/Streaming. If you are pushing around market data on a LAN, than something NOT using TCP would probably suite you better. Tibco RV and JGroups both provide reliable low-latency messaging over multicast.
Just as you mentioned FAST - it is intended for market data distribution and is used by leading stock exchanges and is running on the top of UDP multicast.
In general, with current level of networks reliability it always worth putting your protocol on the top of UDP.
Whatever having session sequence number, NACK+server-to-client-heartbeat and binary marshalling should be close to theoretical performance.
If you have admin/root privilege on the sending side, you can also try a TCP acceleration driver like SuperTCP.