Multiple Node Networking Using PCIe Interconnects
PCI Express (PCIe) interconnects, and how they can be used to support multiple node low latency data transfers over copper or optical cables, is gaining momentum in embedded computing solutions. Many current “out-of-the-box” solutions are being used to interconnect standard Intel-based servers in traditional commercial computer environments to shared I/O devices on Windows or Linux operating systems. Now emerging is the use of PCIe to provide box-to-box external data paths between rugged embedded systems as well as for the internal data path in backplane architectures, such as VPX. Why is this trend developing, and what implementation challenges, as well as possible solutions, is the industry seeing through the use of PCIe?
The proliferation of network applications over the past few decades has led to a ballooning in the number of communication protocols. The government alone is estimated to use over 150 protocols. Some are well known such as Ethernet, Fibre Channel and InfiniBand; lesser-known protocols include, for example, Camera Link and SpaceWire. As new protocols have been added, each has offered a unique value proposition over its predecessors – Fibre Channel offered reliable delivery, InfiniBand offered low latency.
Yet these protocol ecosystems are essentially suffering from insufficient revenues, which doesn’t allow them to advance their capabilities in a timely fashion and fund new product developments, meaning it’s hard for them to compete. This is forcing many protocols, along with their equipment lines and suppliers, to the point of extinction. The result has been protocols with strong value propositions and strong revenues, such as asynchronous transfer mode (ATM) that offered superior Quality of Service (QoS), becoming obsolete.
As consolidation within the communication protocol space continues, network companies, such as Accipiter Systems, have seen the increased need for protocol-independent transports, which carry forward the protocol value proposition without requiring any associated products and their supply chains.
Ethernet is a good example of a protocol-independent transport. For example, Fibre Channel’s reliable delivery is now available as Fibre Channel over Ethernet (FCoE). While Ethernet is thriving as a dominant protocol-independent transport, and continues to capture new market verticals, it has not been able to capture niches, such as InfiniBand. Yet InfiniBand is as threatened as the ATM market. Government system engineers and chief scientists across multiple services are seeing the obsolescence of ATM and are reluctant to design InfiniBand into next generation systems that require 20-year lifecycles.
This has left the industry pining for a low latency, protocol-independent InfiniBand replacement, especially for inter-box interconnects. And it needs to have an economically-viable ecosystem of technologies and suppliers. PCIe is an emerging candidate, since there is a distinct point at which the Ethernet-served market verticals diverge from the low latency market verticals served by PCIe (Figure 1).
With PCIe as a system interconnect, rack-level system architects benefit from PCIe’s high data rates, ease of integration, low latency, low cost and strong supply chain. Accipiter Systems’ network products are a good example of how PCIe can span multiple box systems. This relaxes the expansion constraints of motherboards (limited expansion slots) or chassis (limited module slots) and the homogenous mechanical form factor constraints of both. PCIe spanned systems now can cover a rack and include a heterogeneous suite of processing elements, including FPGA accelerators, GPUs and sequential processors as well as best-in-class storage elements (Figure 2).
Ethernet is the most commonly used method of moving data between computers that are not in the same box. However, concepts like shared memory via standards like VME have been used for years to provide this capability inside the box, due to the need for efficient low latency data movement for advanced applications where timing is critical.
Although these methods provide results, the application programming required to take advantage of them still has a steep cost curve for development. People wanted something similar to the upper layers like IP, UDP and TCP that allowed for simpler programming models to be used. But, we all know that there can be a heavy price to pay in overhead and latency with these protocols, and the only solutions were faster and faster speeds trying to overcome the issue.
Serial point-to-point connectivity of a protocol like PCIe provides the underlying structure where PCI and PCIe are used as interconnects between almost everything inside today’s embedded systems. Cost of components has gone down combined with a steady increase in performance to provide a very cost-effective solution.
But there are some challenges. PCIe is based on the PCI bus standard, which has a basic design structure that ONE CPU controls EVERYTHING in the box, in a top down hierarchy. So even if you do have a nice serial point-to-point protocol like PCIe, how do you get two CPUs to talk to each other over PCIe if you want all the CPUs to be peers? Fortunately, the standards organizations have provided methods to “bridge” elements, allowing each CPU node to act like a normal top-down PCI structure, but still have “windows” into each node from the other. This is done with special bridges called non-transparent bridges (NTB). As previously noted, shared memory concepts are now non-trivial from a programming perspective.
The Industry to the Rescue
VPX, the workhorse of modern embedded systems, is the next-generation descendant of the VME standard. Most of today’s VPX SBC vendors have developed a middleware layer of abstraction to allow easier access to configuration and setup required within the PCI structure.
Imagine all the places where there are bridges, both transparent and non-transparent. Also consider that now many of the vendors are adding PCIe switching to their designs, all of which need to set up their individual configuration spaces to enable data movement and access between, and among, all the nodes. A middleware, like Interface Concept’s Multiware, offers such tools and architecture that manage this environment (Figure 3).
One of the most popular uses of this type of middleware is for virtual Ethernet over PCIe, as it allows for the use of the well-known socket programming model and takes advantage of the speed of PCIe and its ever-increasing bandwidth. But using this technique still does not provide for the low latency required in today’s eternally time critical applications.
By developing programming APIs that allow for simpler access to a preconfigured “shared memory”, the programmer no longer needs to know how to set up all the various configuration spaces or how to set up and use various RDMA (Remote Direct Memory Access) schemes within the architecture. All the tools necessary are provided by the middleware packages developed by various vendors, each having its own concepts and methods for setting up the environment.
So, for now, we find ourselves trying to abstract the programming effort to allow for multiple applications to be developed using a “fixed” API structure. However, it’s non-standard and will only work with the vendor’s middleware for which the new APIs have been developed. This solves the “now” problem of taking full advantage of PCIe as an interconnect in multiple node system designs requiring low latency and high bandwidth, but we also need to work toward standardizing how to work in a multiple vendor environment that fundamentally works at the PCIe level.
Meeting Future Needs
The eco-system supporting PCIe is constantly evolving. Not only is there continued development in new generations of the protocol, with Gen 1, 2, 3 and 4 in the wings, but the industry is looking at the needs for “outside-the-box” cabling. On the surface, this doesn’t seem to require much attention, but in reality, with the upcoming generations of PCIe and new PCIe cable standards, like OcuLink, there will be more demand for multiple node PCIe usage than in the past.
Given the need for a low latency, system-aware protocol, a box-to-box PCIe implementation provides a low-level communication scheme, allowing bindings to high speed, block-level DMA transfers for machines connected over PCIe. Since data can be abstracted from PCIe and used as a lower level transport to TCP/IP, connections to the next level network over Ethernet are possible.
Applications that require transfer rates available in PCIe include computer clusters, data acquisition, RAM-based data storage and storage systems. For some time, PCIe has been used as a low-level backbone network data path to provide a protocol with wide support and a forward path for long term system developers, but more standardization of the APIs above the lower level hardware protocol is still needed. Emerging market needs may become the necessary catalyst to move this definition forward.
This article was written by David Hinkle, Senior FAE Systems, Elma Electronic Inc. (Fremont, CA) with contributions from Dan Flynn, CEO, Accipiter Systems (Wexford, PA). For more information, Click Here .