Does pcap_breakloop() flushes packets in the packet buffer before pcap_loop() returns? - libpcap

I have a library which uses libpcap to capture packets. I'm using pcap_loop() in a dedicated thread for the capture and pcap_breakloop() to stop the capture.
The packet buffer timeout is set to 500ms.
In some rare cases I am missing the last packets that my application sends before calling pcap_breakloop().
Reading the libpcap documentation I ended up wondering if the packet loss is related to the packet buffer timeout. The documentation says:
packets are not delivered as soon as they arrive, but are delivered after a short delay (called a "packet buffer timeout")
What happens if pcap_breakloop() is called during this delay ? Are the packets in the buffer passed to the callback or are they dropped before pcap_loop() returns ?
I was unable to find the answer in the documentation.

Are the packets in the buffer passed to the callback
No.
or are they dropped before pcap_loop() returns ?
Yes. In capture mechanisms that buffer packets in kernel code and deliver them only when the buffer fills up or the timeout expires pcap_breakloop() doesn't force the packets to be delivered.
For some of those capture mechanisms there might be a way to force the timeout to, in effect, expire, but I don't know of any documented way to do that with Linux PF_PACKET sockets, BPF, or WinPcap/Npcap NPF.
Update, giving more details:
On Linux and Windows, pcap_breakloop() attempt to wake up anything that's blocked waiting for packets on the same pcap_t.
On Linux, this is implemented by having the poll() call in libpcap block on both the PF_PACKET socket being used for capturing and on an "event" descriptor; pcap_breakloop() causes the "event" descriptor to supply an event, so that the poll() wakes up even if there are no packets to pick up from the socket yet. That does not force the current chunk in the buffer (memory shared between the kernel and userland code) to be assigned to userland, so they're not provided to the caller of libpcap.
On Windows, with Npcap, an "event object" is used by the driver and Packet32 library (the libpcap part of Npcap calls routines in the Packet32 library) to allow the library to block waiting for packets and the driver to wake the library up when packets are available. pcap_breakloop() does a SetEvent() call on the handle for that object, which forces userland code waiting for packets to wake up; it then tries to read from the device. I'd have to spend more time looking at the driver code to see whether, if there are be buffered-but-not-delivered packets at that point, they will be delivered.
On all other platforms, pcap_breakloop() does not deliver a wakeup, as the capture mechanism either does no buffering or provides no mechanism to force a wakeup, so:
if no buffering is done, there's no packet buffer to flush;
if there's a timeout, code blocked on a read will be woken up when the timeout expires, and that buffer will be delivered to userland;
if there's no timeout, code blocked on a read could be blocked for an indefinite period of time.
The ideal situation would be if the capture mechanism provided, on all platforms that do buffering, a way for userland code to force the current buffer to be delivered, and thus to cause a wakeup. That would require changes to the NPF driver and Packet32 library in Npcap, and would require kernel changes in Linux, *BSD, macOS, Solaris, and AIX.
Update 2:
Note also that "break loop" means break out of the loop immediately, so even if all of the above were done, when the loop is exited, there might be packets remaining in libpcap's userland buffer. If you want those packets - even though, by calling pcap_breakloop(), you told libpcap "stop giving me packets" - you'll have put the pcap_t in non-blocking mode and call pcap_dispatch() to drain the userland buffer. (That won't drain the kernel buffer.)

Related

libpcap: which platforms support packet buffer timeout via pcap_set_timeout()?

I'd like to have pcap_dispatch() timeout if no packets are received within a set period of time. Similar to this SO question.
In the pcap(3) manpage, it says that not all platforms support that:
Not all platforms support a packet buffer timeout; on platforms that
don't, the packet buffer timeout is ignored. A zero value for the
timeout, on platforms that support a packet buffer timeout, will cause
a read to wait forever to allow enough packets to arrive, with no
timeout. A negative value is invalid; the result of setting the
timeout to a negative value is unpredictable.
And in this post, user862787 said that "Some OSes time out even if
no packets have arrived, others don't"
It's considered platform-specific because it is, but it's not
considered buggy (trust me, I'm the person who wrote that text in the
man page) - the timeout is to keep pcap_dispatch() from waiting
forever for a packet buffer to fill, not to keep it from waiting
forever for any packets to arrive at all. Some OSes time out even if
no packets have arrived, others don't. – user862787 Oct 19 '12 at
20:53
So how do I know which platforms support and which don't? I've searched and gone through the libpcap source but didn't find anything.
Specifically, what about Centos 8.1, kernel 4.18.0-147.el8.x86_64, libpcap 1.10 ?
On systems using the BPF capture mechanism - *BSD, macOS, Solaris 11, AIX - the timeout will occur even if no packets have arrived.
On most versions of most Linux distributions, it won't.
Don't depend on it doing so or not doing so; write your code not to depend on that.
I've searched and gone through the libpcap source but didn't find anything.
You need to look at the source for the capture mechanism libpcap uses on particular platforms, not at the libpcap source.

How to resolve tcpdump dropped packets?

I am using tcpdump to capture network packets and running into issue when I start dropping packets. I ran an application which exchanges packets rapidly over network; resulting in high network bandwidth.
>> tcpdump -i eno1 -s 64 -B 919400
126716 packets captured
2821976 packets received by filter
167770 packets dropped by kernel
Since I am only interested in protocol related part from TCP packet; I want to collect TCP packets without data/payload. I hope this strategy can also help in capturing more packets before dropping packets. It appears that I can only increase buffer size (-B argument) upto certain limit. Even with higher limit I am dropping more packets than captured.
can you help me understanding above messages and questions I have
what are packets captured ?
what are packets received by filter?
what are packets dropped by kernel?
how can I capture all packets at high bandwidth without dropping any packets. My test application runs for 3 minutes and exchanges packets at a very high rate. I am only interested in protocol related information not in actual data/ payload being sent.
From Guy Harris himself:
the "packets captured" number is a number that's incremented every time tcpdump sees a packet, so it counts packets that tcpdump reads from libpcap and thus that libpcap reads from BPF and supplies to tcpdump.
The "packets received by filter" number is the "ps_recv" number from a call to pcap_stats(); with BPF, that's the bs_recv number from the BIOCGSTATS ioctl. That count includes all packets that were handed to BPF; those packets might still be in a buffer that hasn't yet been read by libpcap (and thus not handed to tcpdump), or might be in a buffer that's been read by libpcap but not yet handed to tcpdump, so it can count packets that aren't reported as "captured".
And from the tcpdump man page:
packets ``dropped by kernel'' (this is the number of packets that were dropped, due to a lack of buffer space, by the packet capture mechanism in the OS on which tcpdump is running, if the OS reports that information to applications; if not, it will be reported as 0).
To attempt to improve capture performance, here are a few things to try:
Don't capture in promiscuous mode if you don't need to. That will cut down on the amount of traffic that the kernel has to process. Do this by using the -p option.
Since you're only interested in TCP traffic, apply a capture expression that limits the traffic to TCP only. Do this by appending "tcp" to your command.
Try writing the packets to a file (or files to limit size) rather than displaying packets to the screen. Do this with the -w file option or look into the -C file_size and -G rotate_seconds options if you want to limit file sizes.
You could try to improve tcpdump's scheduling priority via nice.
From Wireshark's Performance wiki page:
stop other programs running on that machine, to remove system load
buy a bigger, faster machine :)
increase the buffer size (which you're already doing)
set a snap length (which you're already doing)
write capture files to a RAM disk
Try using PF_RING.
You could also try using dumpcap instead of tcpdump, although I would be surprised if the performance was drastically different.
You could try capturing with an external, dedicated device using a TAP or Switch+SPAN port. See Wireshark's Ethernet Capture Setup wiki page for ideas.
Another promising possibility: Capturing Packets in Linux at a Speed of Millions of Packets per Second without Using Third Party Libraries.
See also Andrew Brown's Sharkfest '14 Maximizing Packet Capture Performance document for still more ideas.
Good luck!
I would try actually lowering the value of your -B option.
The unit is 1 KiB (1024 bytes), thus the buffer size you specified (919400) is almost 1 gigabyte.
I suppose you would get better results by using a value closer to your CPU cache size, e.g. -B 16384.

Dos libpcap free memory ifself?

My program use libpcap like this:
while pcaket = pcap_next() {
...
(modify the pcaket and do checksum)
...
pcap_sendpacket(pcaket)
}
Recently, I found there is a memory leak in my program...
My question is:
Will the libpcap free the pcaket after pcap_next? or I have to do the free work myself ?
Will the libpcap free the pcaket after pcap_next?
The packet is contained in a buffer internal to libpcap (attached to the pcap_t); a new buffer is not allocated for each packet, so the buffer isn't freed after pcap_next(), it's freed after the pcap_t is closed. You do not have to free it yourself.
(This also means that the packet data from a particular call to pcap_next() or pcap_next_ex() is not guaranteed to remain valid after the next call to pcap_next() or pcap_next_ex() - or pcap_loop() or pcap_dispatch(); it might be overwritten with data from the next packet or the next batch of packets.)

pcap_dispatch - callback processing questions

I am writing fairly simply pcap "live" capture engine, however the packet processing callback implementation for pcap_dispatch should take relatively long time for processing.
Does pcap run every "pcap_handler" callback in separate thread? If yes, is "pcap_handler" thread-safe, or should the care be taken to protect it with critical sections?
Alternatively, does pcap_dispatch callback works in serial fashion? E.g. is "pcap_handler" for the packet 2 called only after "pcap_handler" for packet 1 is done? If so, is there an approach to avoid accumulating latency?
Thanks,
-V
Pcap basically works like this: There is a kernel-mode driver capturing the packets and placing them in a buffer of size B. The user-mode application may request any amount of packets at any time using pcap_loop, pcap_dispatch, or pcap_next (the latter is basically pcap_dispatch with one packet).
Therefore, when you use pcap_dispatch to request some packets, libpcap goes to the kernel and asks for the next packet in the buffer (If there isn't one the timeout code and stuff kicks in, but this is irrelevant for this discussion), transfers it into userland and deletes it from the buffer. After that, pcap_dispatch calls your handler, reduces it's packets-to-do counter and starts from the beginning. As a result, pcap_dispatch only returns if the requested amount of packets have been processed, an error ocurred, or a timeout happened.
As you can see, libpcap is completely non-threaded, as most C API's are. The kernel-mode driver, however, is obviously happy enough to deliver packets to multiple threads (else you wouldn't be able to capture from more than one process), and is completly thread-safe (there is one separate buffer for each usermode handle).
This implies that you must implement all parallelisms by yourself. You'd want to do something like this:
pcap_dispatch(P, count, handler, data);
.
.
.
struct pcap_work_item {
struct pcap_pkthdr header;
u_char data[];
};
void handler(u_char *user, struct pcap_pkthdr *header, u_char *data)
{
struct pcap_work_item *item = malloc(sizeof(pcap_pkthdr) + header->caplen);
item->header = *header;
memcpy(item->data, data, header->caplen);
queue_work_item(item);
}
Note that we have to copy the packet into the heap, because the header and data pointers are invalid after the callback returns.
The function queue_work_item should find a worker thread, and assign it the task of handling the packet. Since you said that your callback takes a 'relativley long time', you likely need a large number of worker threads. Finding a suitable number of workers is subject to fine-tweaking.
At the beginning of this post I said that the kernel-mode driver has buffer to collect incoming packets which await processing. The size of this buffer is implementation-defined. The snaplen parameter to pcap_open_live only controls how many bytes of one packet are captured, however, the number of packets cannot be controlled in a portable fashion. It might be fixed-size. It might get larger as more and more packets arrive. However, if it overflows, all further packets are discarded until there is enough space for the next one to arrive. If you want to use your application in a high-traffic environment, you want to make sure that your *pcap_dispatch* callback completes quickly. My sample callback simply assigns the packet to a worker, so it works fine even in high-traffic enviroments.
I hope this answers all your questions.

Is long polling more resource efficient than classical polling?

Why is it more efficient for me to hold an http connection open until content comes and then re-open the connection than to simply open a connection periodically?
Ofcourse the latter scenario is perhaps more hit or miss but I am asking purely from a resource efficiency point of view.
By keeping a connection open, you are blocking resources but not incurring the overhead of periodic tearing down connections and setting up connections. Setting & closing a socket connection is lot more expensive underneath the function call. Sending the close intent to the connection end point, freeing the kernel resources and memory associated with it. For opening the connection, the same in reverse happens. For allocating kernel resources, there may be serialized calls (depends on kernel implementation) which can affect the overall system performance. Last but not the least, the hit-n-miss approach is not a deterministic model.
Let's say you have a thread blocked on a socket waiting for a response. (As in comet). During that time, the thread isn't scheduled by the kernel and other things on the machine can run. However if you're polling the thread is busy with brief wait periods. This also adds latency because you won't know of a need to do something until the poll occurs.