Packet capture in RDMA? - tcpdump

Is there any utility like tcpdump in Linux for capturing the traffic which is going over RDMA channel? (Infiniband/RoCE/iWARP)

Old thread, but still:
As Roland pointed out, sniffing RDMA traffic is tricky, because once the endpoints did the initial handshake, traffic goes through network card (HCA) directly to the memory.
The only way to sniff this traffic w/o putting a dedicated HW sniffer on the wire is to have vendor-specific hooks in the network card, and a SW tool that uses these hooks.
If you have Mellanox HCAs, you can use the "ibdump" tool. This tool is also a part of Mellanox OFED package.
If you have other vendor's HW, you need to check with that vendor - you won't find any open-source packet sniffer for all RDMA-capable devices, sorry.

In general, no. One of the main characteristics of RDMA is that all the network processing is done on the adapter, without involving the CPU at all. Typically work requests are queued up directly from userspace to the adapter, without any system call. So there's nowhere for a sniffer to hook in to get traffic.
With that said, for Ethernet protocols, iWARP or IBoE (aka RoCE), you can hook up a system in the middle of a connection and set it up to do forwarding in software (eg the Linux bridge module) and then run tcpdump or wireshark to capture the RDMA traffic that passes through this system. Wireshark even has dissectors for iWARP and IBoE.
For native InfiniBand it is theoretically possible to build something similar (set up an adapter to capture and forward traffic) but as far as I know, no one has done even the needed firmware or driver work to do basic packet sniffing.

Chelsio's T4 device supports a packet trace feature allowing it to replicate ingress/egress offload packets to one of the device's NIC queues. Then you can use tcpdump or whatever on that ethX interface to see the RDMA or TOE packets.

Wireshark can be the one. But the problem is you need an observing server. Enabling the mirror feature, you should be able to receive the ROCE pocket at the observer.

A sure way to capture such traffic is to duplicate it into dedicated capture ports. Those ports might be additional ethernet/IB ports (of additional adapters) in your development machine or they may be located in an additional capture machine.
There are basically 2 ways how to duplicate the traffic:
Configure port-mirroring in your switch. Support for port mirroring is pretty common in managed Ethernet switches, even in cheap ones. This feature is also available in some Mellanox Infiniband switches. You can configure to mirror both directions of a port into another one, although this oversubscribes the receiver if the mirrored port receives and sends at line speed at the same time (full-duplex). In such a situation some frames can't be forwarded to the capture port then and are thus dropped. To avoid this limitation one needs to mirror each direction into a separate capture port.
Connect your network cable to a TAP (target access point) device that duplicates or splits the signal. With optical networking those TAPs are often constructed in a completely passive way and thus don't add much complexity and are relatively cheap to produce (examples). You need one TAP for each fiber, i.e. you always occupy 2 capture ports if you want to capture both directions. TAP devices are available for the fibers and connectors commonly used in Ethernet networks. If your Infiniband hardware uses the same then you should be able to use the same TAP devices there, as well. At least the passive ones.
Once the mirrored/tapped traffic arrive at your capture port(s), you can use standard capture tools such as tcpdump.
For Infiniband there is ibdump, however, depending on the Infiniband software you are using (open-source OFED vs. the proprietary Mellanox OFED) and the host channel adapter (HCA) you might be able to use tcpdump to capture Infiniband traffic, as well.

Related

When using WebRTC, is a peer-to-peer architecture redundant to build a video chat service like Skype?

We're playing around with WebRTC and trying to understand its benefits.
One reason Skype can serve hundreds of millions of people is because of its decentralized, peer-to-peer architecture, which keeps server costs down.
Does WebRTC allow people to build a video chat application similar to Skype in that the architecture can be decentralized (i.e., video streams are not routed from a broadcaster through a central server to listeners but rather routed directly from broadcaster to listener)?
Or, put another way, does WebRTC allow someone to essentially replicate the benefits of a P2P architecture similar to Skype's?
Or do you still need something similar to Skype's P2P architecture?
Yes, that's basically what WebRTC does. Calls using the getPeerConnection() API don't send voice/video data through a centralized server, but rather use firewall traversal protocols like ICE, STUN and TURN to allow a direct, peer-to-peer connection. However, the initial call setup still requires a server (most likely something running a WebSocket implementation, but it could be anything that you can figure out how to get JavaScript to talk to), so that the two clients can figure out that they're both online, signal that they want to connect, and then figure out how to do it (this is where the ICE/STUN/TURN bit comes in).
However, there's more to Skype's P2P architecture than just passing voice/video data back and forth. The majority of Skype's IP isn't in the codecs or protocols (much of which they licensed from Global IP Solutions, which Google purchased two years ago and then open-sourced, and which forms of the basis of Chrome's WebRTC implementation). Skype's real IP is all in the piece of WebRTC which still depends on a server: figuring out which people are online, and where they are, and how to get a hold of them, and doing that in a massively decentralized fashion. (See here for some rough details.) I think that you could probably use the DataStream portion of the getPeerConnection() API to do that sort of thing, if you were really, really smart - but it would be complicated, and would most likely stomp on a few Skype patents. Unless you want to be really, really huge, you'd probably just want to run your own centralized presence and location servers and handle all that stuff through standard WebSockets.
I should note that Skype's network architecture has changed since it was created; it no longer (from what I hear) uses random users as supernodes to relay data from client 1 to client 2; it didn't scale well and caused rampant variability in results (and annoyed people who had non-firewalled connections and bandwidth).
You definitely can build something SKype-like with WebRTC - and more. :-)

SSH compression for X11 forwarding

I’m on the East coast of the United States, SSHing into a server on the West coast.
I’ve managed to get X11 forwarding working so I can launch GUI apps for certain tasks where it’s helpful. However, for all the X11-forwarded apps (especially emacs!), there is so much lag between input (keystrokes, mouse clicks, etc.) and response that it sometimes goes from being incredibly frustrating to potentially harmful—when I intend to do A, but B happens because the lag is so great.
Is SSH compression a potential culprit? What kind of compression should I be using?
X11 graphics take up a lot of bandwidth. If your remote host is some distance away (i.e. not on the LAN), then you'll probably suffer sluggishness in your exported X11 applications.
I'm not sure about SSH compression. Performance may depend on other factors, such as CPU performance. From the ssh man page:
-C Requests compression of all data (including stdin, stdout,
stderr, and data for forwarded X11 and TCP connections). The
compression algorithm is the same used by gzip(1), and the
“level” can be controlled by the CompressionLevel option for pro‐
tocol version 1. Compression is desirable on modem lines and
other slow connections, but will only slow down things on fast
networks. The default value can be set on a host-by-host basis
in the configuration files; see the Compression option.
Here are some other workarounds you can use to make things faster:
Instead of interacting with the GUI using X11 forwarding, consider something else that has better optimization/compression, such as VNC or NX/FreeNX.
Use the terminal version of emacs instead of the GUI version.
As you specifically mentioned emacs: there is the command-line option
-nw, --no-window-system
Tell Emacs not to create a graphical frame. If you use
this switch when invoking Emacs from an xterm(1) window,
display is done in that window.
This can be much faster when working over ssh (as it only has to transfer character, not redraw the whole screen over X11).

Do HTML WebSockets maintain an open connection for each client? Does this scale?

I am curious if anyone has any information about the scalability of HTML WebSockets. For everything I've read it appears that every client will maintain an open line of communication with the server. I'm just wondering how that scales and how many open WebSocket connections a server can handle. Maybe leaving those connections open isn't a problem in reality, but it feels like it is.
In most ways WebSockets will probably scale better than AJAX/HTML requests. However, that doesn't mean WebSockets is a replacement for all uses of AJAX/HTML.
Each TCP connection in itself consumes very little in terms server resources. Often setting up the connection can be expensive but maintaining an idle connection it is almost free. The first limitation that is usually encountered is the maximum number of file descriptors (sockets consume file descriptors) that can be open simultaneously. This often defaults to 1024 but can easily be configured higher.
Ever tried configuring a web server to support tens of thousands of simultaneous AJAX clients? Change those clients into WebSockets clients and it just might be feasible.
HTTP connections, while they don't create open files or consume port numbers for a long period, are more expensive in just about every other way:
Each HTTP connection carries a lot of baggage that isn't used most of the time: cookies, content type, conetent length, user-agent, server id, date, last-modified, etc. Once a WebSockets connection is established, only the data required by the application needs to be sent back and forth.
Typically, HTTP servers are configured to log the start and completion of every HTTP request taking up disk and CPU time. It will become standard to log the start and completion of WebSockets data, but while the WebSockets connection doing duplex transfer there won't be any additional logging overhead (except by the application/service if it is designed to do so).
Typically, interactive applications that use AJAX either continuously poll or use some sort of long-poll mechanism. WebSockets is a much cleaner (and lower resource) way of doing a more event'd model where the server and client notify each other when they have something to report over the existing connection.
Most of the popular web servers in production have a pool of processes (or threads) for handling HTTP requests. As pressure increases the size of the pool will be increased because each process/thread handles one HTTP request at a time. Each additional process/thread uses more memory and creating new processes/threads is quite a bit more expensive than creating new socket connections (which those process/threads still have to do). Most of the popular WebSockets server frameworks are going the event'd route which tends to scale and perform better.
The primary benefit of WebSockets will be lower latency connections for interactive web applications. It will scale better and consume less server resources than HTTP AJAX/long-poll (assuming the application/server is designed properly), but IMO lower latency is the primary benefit of WebSockets because it will enable new classes of web applications that are not possible with the current overhead and latency of AJAX/long-poll.
Once the WebSockets standard becomes more finalized and has broader support, it will make sense to use it for most new interactive web applications that need to communicate frequently with the server. For existing interactive web applications it will really depend on how well the current AJAX/long-poll model is working. The effort to convert will be non-trivial so in many cases the cost just won't be worth the benefit.
Update:
Useful link: 600k concurrent websocket connections on AWS using Node.js
Just a clarification: the number of client connections that a server can support has nothing to do with ports in this scenario, since the server is [typically] only listening for WS/WSS connections on one single port. I think what the other commenters meant to refer to were file descriptors. You can set the maximum number of file descriptors quite high, but then you have to watch out for socket buffer sizes adding up for each open TCP/IP socket. Here's some additional info: https://serverfault.com/questions/48717/practical-maximum-open-file-descriptors-ulimit-n-for-a-high-volume-system
As for decreased latency via WS vs. HTTP, it's true since there's no more parsing of HTTP headers beyond the initial WS handshake. Plus, as more and more packets are successfully sent, the TCP congestion window widens, effectively reducing the RTT.
Any modern single server is able to server thousands of clients at once. Its HTTP server software has just to be is Event-Driven (IOCP) oriented (we are not in the old Apache one connection = one thread/process equation any more). Even the HTTP server built in Windows (http.sys) is IOCP oriented and very efficient (running in kernel mode). From this point of view, there won't be a lot of difference at scaling between WebSockets and regular HTTP connection. One TCP/IP connection uses a little resource (much less than a thread), and modern OS are optimized for handling a lot of concurrent connections: WebSockets and HTTP are just OSI 7 application layer protocols, inheriting from this TCP/IP specifications.
But, from experiment, I've seen two main problems with WebSockets:
They do not support CDN;
They have potential security issues.
So I would recommend the following, for any project:
Use WebSockets for client notifications only (with a fallback mechanism to long-polling - there are plenty of libraries around);
Use RESTful / JSON for all other data, using a CDN or proxies for cache.
In practice, full WebSockets applications do not scale well. Just use WebSockets for what they were designed to: push notifications from the server to the client.
About the potential problems of using WebSockets:
1. Consider using a CDN
Today (almost 4 years later), web scaling involves using Content Delivery Network (CDN) front ends, not only for static content (html,css,js) but also your (JSON) application data.
Of course, you won't put all your data on your CDN cache, but in practice, a lot of common content won't change often. I suspect that 80% of your REST resources may be cached... Even a one minute (or 30 seconds) CDN expiration timeout may be enough to give your central server a new live, and enhance the application responsiveness a lot, since CDN can be geographically tuned...
To my knowledge, there is no WebSockets support in CDN yet, and I suspect it would never be. WebSockets are statefull, whereas HTTP is stateless, so is much easily cached. In fact, to make WebSockets CDN-friendly, you may need to switch to a stateless RESTful approach... which would not be WebSockets any more.
2. Security issues
WebSockets have potential security issues, especially about DOS attacks. For illustration about new security vulnerabilities , see this set of slides and this webkit ticket.
WebSockets avoid any chance of packet inspection at OSI 7 application layer level, which comes to be pretty standard nowadays, in any business security. In fact, WebSockets makes the transmission obfuscated, so may be a major breach of security leak.
Think of it this way: what is cheaper, keeping an open connection, or opening a new connection for every request (with the negotiation overhead of doing so, remember it's TCP.)
Of course it depends on the application, but for long-term realtime connections (e.g. an AJAX chat) it's far better to keep the connection open.
The max number of connections will be capped by the max number of free ports for the sockets.
No it does not scale, gives tremendous work to intermediate routes switches. Then on the server side the page faults (you have to keep all those descriptors) are reaching high values, and the time to bring a resource into the work area increases. These are mostly JAVA written servers and it might be faster to hold on those gazilions of sockets then to destroy/create one.
When you run such a server on a machine any other process can't move anymore.

Peer-To-Peer connection via WiFi

Is it possible ?
Peer-To-Peer connection via WiFi (same Access Point) , how would multiple devices talk on this layer.
Any API available or sources that can be looked into ?
Thanks
Yogurt
The Wi-Fi Alliance on Monday announced that its direct peer-to-peer networking version of WiFi, called WiFi Direct, is now available on several new WiFi devices. The Alliance is also announcing that it has begun the process of certifying devices for WiFi Direct compatibility.
Try researching the state of Bonjour / Avahi / Zeroconf on android. I'm seeing some pages that indicate people have made some progress for certain purposes but nothing like a generic howto or ready to use library.
Unless you know the IP address assigned to the other peer already, you'd have to somehow have devices inform other devices that they exist.
Are these devices definitely going to be on the same subnet? If so you can try messing around with having the devices send out 'broadcast' packets. I have no idea if the Android API lets actual applications receive these though.
The more reliable approach would probably be to run some centralised server somewhere that devices register with when they go online, and give their IP address when they register. Then they can query that server for which other devices are nearby and what their IP addresses are. If this is to be a central server out on the wider internet, then it means that unfortunately that the device discovery part isn't peer to peer - there is some privacy implication. Another thing if this is a central server is that you'll have to design the querying process to cope with NAT so that querying for other devices on your local network is restricted to the right network, but that you also get their IP address.

Design a networking application

Problem:
I need to design a networking application that can handle network failures and switch to another network in case of failure.
Suppose I am using three ethernet connections and one wireless also . At a particular moment only one connection is being used.
How should I design my system so that it can switch to another network in case of failure.
I know this is very broad question but any pointers will help!
I'd typically make sure that there's routing on the network and run one (or more) routing protocol instances on the host. That way network failure is (mostly) transparent to the application, as the host OS takes care of sending packets the right way.
On the open-source side, I have good experiences with zebra and quagga, at least on linux machines.
Create a domain model for this, describing the network elements, the kind of failures you want to be able to detect and handle, and demonstrate that it works. Then plug in the network code.
have one class polling for the connection. If poll timeout fires switch the ethernet settings. For wireless, set the wifi settings to autoconnect and then just enable/disable the wificard.
(But I dont know how you switch the ethernet connection)
First thing I would do is look for APIs that will give me network disconnection events.
I'd also find a way to check the state of the network connections.
These would vary depending on the OS and the Language used so you might want to have this abstracted in your application.
Example:
RegisterDisconnectionEvent(DisconnectionHandler);
function DisconnectionHandler()
{
FindActiveNetworkConnection();
// do something else...
}
A primitive way to do it would be to look out for network disconnection events. Your sequence would be:
Register/poll for network connections status changes. Maintain a list of all active network connections.
Use the first available network connection (Alternately you could sort it based on interface bandwidths, and use the one with highest bandwidth).
When you detect a down connection, use the next active one.
However, if there are implications to the functionality of your application, based on which network connection you use, you are much better off, having either a routing protocol do the job for you, or have a tracking application within your application. This tracking application would track network paths (through various methods like ping, traceroute, etc) across all your available interfaces to see which one can reach the ultimate destination, and use the appropriate network interface.
Also, you could monitor your network interfaces for not just status changes, but also for input/output errors, and change your selection accordingly. This would help you use the most efficient network at any given point of time. But this would need to be balanced with the churn caused by switching a network connection.
If you control all of the involved hosts, Multipath TCP will probe all of your connections and automatically choose the one that works; if multiple connections are working, it will load balance across them.
If you don't control the endpoints, there's no choice but doing the probing in the application. Mosh is an example of an application that does that quite elegantly.
You didn't mention what your application does; perhaps it would be possible to redesign your protocol so that it uses all available connections simultaneously, the way BitTorrent does, and therefore doesn't care about some links being down at any given time?