iWARP update advances RDMA over Ethernet for data center and cloud networks
- 11 December, 2014 07:40
The challenge for data center operators selecting a high performance transport technology for their network is striking the ideal balance between acquisition, deployment and management costs, and support for high performance capabilities such as the remote direct memory access (RDMA) protocol.
The iWARP protocol is the open Internet Engineering Task Force (IETF) standard for RDMA over Ethernet, and offers an interchangeable, RDMA verbs-compliant alternative to specialized fabrics such as InfiniBand. iWARP adapters are fully supported within the Open Fabrics Alliance Enterprise software distribution (OFED), with typically no changes needed for applications to migrate from InfiniBand to Ethernet.
With the recent approval of RFC 7306 by the IETF, the iWARP protocol gains a number of features that eliminate differences with the latest InfiniBand capabilities to further enhance the portability of RDMA applications.
The iWARP Open Standard
The iWARP protocol stack layers RDMA transport functionality on top of TCP/IP, leveraging this ubiquitous stack's reach, robustness and reliability. iWARP traffic is identical to other TCP/IP applications and requires no special support from switches and routers, or changes to network devices. Thanks to hardware offloaded TCP/IP, iWARP RDMA NICs offer high performance and low latency RDMA functionality, and native integration within today's large Ethernet-based networks and clouds.
Initially aimed at high performance computing applications, iWARP is now finding a home in data centers thanks to its availability on high-performance 40G Ethernet NICs and increased data center demand for low latency, high bandwidth, and low server CPU utilization. It has also been integrated into server operating systems such as Microsoft Windows Server 2012 with SMB Direct, which can seamlessly take advantage of iWARP RDMA without user intervention.
RDMA over Ethernet
RDMA has traditionally required deploying a specialized fabric, with associated acquisition and maintenance expenses, as it typically must coexist alongside an Ethernet network in a data center. While InfiniBand is a well-known RDMA interconnect, its performance advantages have traditionally stemmed from advanced physical layers that kept it ahead of Ethernet. With the convergence in high-speed serial link designs across technologies, Ethernet has bridged this gap and now follows an identical speed curve.
Today, 40Gbps Ethernet and FDR IB offer practically the same application level performance, while 100Gbps Ethernet and EDR IB are appearing on the market at the same time. These advances have made Ethernet a serious contender as an RDMA fabric.
An InfiniBand-based RDMA over Ethernet contender (RDMA over Converged Ethernet or RoCE) has recently been released by the InfiniBand Trade Association (IBTA).
RoCE replaces the physical and MAC layers of IB with the Ethernet equivalents. However, due to the absence of reliability layers, it requires "lossless" network operation -- i.e. data center bridging equipment and PAUSE throughout the network. Today's RoCE implementations are not routable, limiting an implementation to a one-hop Ethernet subnet, e.g. a rack.
A just released revision 2 of the RoCE standard specifies an incompatible layering over UDP/IP to provide routability. However, it does not specify how lossless operation will be provided over an IP network, or how congestion control will be handled.
In contrast, iWARP is a stable and mature open standard, ratified by the IETF in 2007. iWARP was designed to leverage TCP/IP from the outset, which allows it to operate over cost-effective, regular Ethernet infrastructure and deliver on the promise of true convergence, with Ethernet transporting LAN, SAN and RDMA traffic over a single wire.
iWARP provides all the key RDMA technologies that lower latency and improve efficiency by offloading networking tasks from a server processor:
- Direct data placement, which reads/writes data directly from/to application memory
- Kernel bypass, which alleviates the cost of context switching from user space to kernel space
- Transport acceleration, which leverages protocol engines on a network controller to offload packet processing from the system CPU
iWARP RDMA is deployed on network controllers that have the capability to perform all of the processing of the network stack, including connection context, segmenting and reassembling packets and interrupt handling. With RDMA, the server processor no longer needs to touch the data or copy payload from a receive buffer to the application buffer.
The iWARP standard was recently updated with the adoption by IETF of RFC 7306, which adds two RDMA extensions: atomic operations and immediate data messages. These additions bring in RDMA functionality that is supported in IB, but were not part of the original iWARP specification.
Atomic operations allow clients to implement synchronized access to shared remote memory locations, i.e. guarantee that the operations performed on these memory locations by different users do not interleave such that the end result is undefined.
Atomics enhance the usability of iWARP in distributed system environments, such as message passing interface (MPI) applications, where jobs executing in parallel across the network can more readily implement synchronization points.
The new RFC also provides support for immediate data messages. This capability allows the upper-layer protocols (ULP) at the sender to provide a small amount of data to be delivered as part of the completion of the RDMA operation, improving the efficiency of delivering small notifications. With this enhancement, iWARP can support the RDMA Write with Immediate message that is found in other RDMA transport protocols.
iWARP use is rising in cloud environments, such as Microsoft's recently launched Cloud Platform System (CPS), a turnkey scalable private cloud appliance. CPS is a complete solution consisting of up to 4 racks of Dell hardware running Windows Server 2012, with the hypervisor, the management tools, and the automation needed for a large-scale, software-defined data center. The CPS networking infrastructure includes an Ethernet fabric and an iWARP-enabled storage fabric, which utilizes the SMB Direct protocol for high efficiency, high performance storage networking.
Another emerging and potentially disruptive storage networking technology is NVMe over Fabrics, which marries high performance SSD drives with an advanced interconnect such as RDMA. The first demonstration of a working NVMe over Fabrics setup used iWARP over 40Gb Ethernet adapters, exceeding the performance expectations, with the same IOPS and less than 8 micro seconds of increased latency compared to locally attached drives.
The need for high performance and high efficiency communications in the data center is growing. With the additional functionality provided by IETF RFC 7306 and plug-and-play support in server operating systems, iWARP now offers all the benefits of RDMA over standard Ethernet with lower cost, simplified deployment and greater ease of use. iWARP products are available in the market today and more are coming, so this is a technology data centers can take advantage of right now.