Using High-Performance Computing Clusters to Support Fine-Grained Parallel Applications
A custom-built serial board connects FPGAs to accelerate performance.
A heterogeneous cluster comprised of host processors and field programmable gate arrays (FPGAs) was used to accelerate the performance of parallel fine-grained applications using a direct FPGA- to-FPGA communications channel. The communications channel is implemented with an all-to-all board that attaches directly to the FPGA boards via their I/O interface. Parallel Discrete Event Simulation (PDES) was used to demonstrate the acceleration performance.
In previous efforts to accelerate the performance of PDES, it was found that the communication subsystem is a major bottleneck in PDES performance. In addition, initial efforts in exploiting the FPGAs on a Heterogeneous High Per - formance Cluster (HHPC) to accelerate the performance of a PDES simulation were reported. Using FPGA boards to accelerate the performance of some critical simulation subsystems was the goal of the study. Since PDES is a fine-grained operation, and the communication with the FPGA board is expensive, it is almost impossible to use the FPGAs to optimize the simulation kernel.
In response to this limitation, an alternative channel for the FPGAs to communicate without having to interrupt the primary host processor was created. To achieve this, a serial all-to-all connector board that provides direct, low-bandwidth, low-latency connectivity among the FPGA boards was designed. This board provided a channel for the FPGAs to communicate directly, potentially greatly improving the performance of fine-grained applications with components of the computation residing on the FPGAs.
To demonstrate such an application, the Global Virtual Time computation was used as a target for FPGA implementation. Each node provides local time and message counts when it enters GVT computation phase and whenever transit message count changes to the FPGA board. The boards communicate among each other to detect the global messages in transit count. When that reaches 0, they compute the minimum of the local times and broadcast it to all the host processors.
The all-to-all board was tested for functionality and performance to set the baseline physical rate on which it can communicate. Further, support for communication using the all-to-all board had to be developed: the equivalent for the link layer for this communication channel.
The HHPC is a Beowulf cluster made of off-the-shelf PCs (featuring dual Intel Xeon processors) interconnected via a Gigabit Ethernet Network and a Myrinet network. In addition, each node has an (AMD) Wildstar II FPGA board on the PCI bus. The Wildstar has a Xilinx Virtex II FPGA, some DRAM and SRAM banks, and an LVDS I/O card. The I/O card was used to interconnect the FPGAs directly to each other using a custom-built all-to-all serial board. This board provides connectivity from every node to every other node concurrently using a dedicated serial line. This results in a low-latency but low-bandwidth communication channel among the FPGAs.
Without this connectivity, all communication must go through the communication fabric at a latency ranging at about 10 microseconds (for the Myrinet) to several tens of microseconds for Gigabit Ethernet. Typically, FPGA boards are used to accelerate sequential or highgranularity parallel applications that have high data parallelism or unusual data paths. PDES does not fit this profile: it is fine-grained and does not, in general, require high data parallelism.
This work was done by Nael Abu-Gazaleh of the State University of New York – Binghamton for the Air Force Research Laboratory.
AFRL-0118
This Brief includes a Technical Support Package (TSP).

Using High-Performance Computing Clusters to Support Fine-Grained Parallel Applications
(reference AFRL-0118) is currently available for download from the TSP library.
Don't have an account?
Top Stories
INSIDERRF & Microwave Electronics
Blue Ghost Arrives in Lunar Orbit, Prepares for Landing
NewsConnectivity
Closing Gap to Leverage Enhanced Computational Power for SDV Advancement
ArticlesEnergy
Hybrid Powertrains in the Product Mix
ProductsElectronics & Computers
INSIDERElectronics & Computers
Researchers Achieve Breakthrough in New Design of Superconducting Quantum...
Technology ReportMaterials
Lighter, Recyclable Body Seal from Cooper Standard Wins SAA Award
Webcasts
Automotive
Leveraging Simulation for Net Zero Emissions in Conventional and...
Materials
Quickly Prototyping Custom Textures on Automotive Parts
Unmanned Systems
March 2025 Automated and Connected Vehicles Digital Summit
Aerospace
A Guide to Electric Aircraft Systems Sizing: ePowertrain, TMS,...
Aerospace
Advancements in Pulsating Heat Pipes: Analysis and Applications...