Using High-Level Language to Implement Floating-Point Calculations on FPGAs

High-level languages reduce the complexity of hardware design.

The scientific community is interested in using field-programmable gate arrays (FPGAs) for scientific computations because they can be targeted for specific applications and achieve greater throughput at a lower power cost. However, these gains can usually only be achieved by a user with expert knowledge of hardware design. Therefore, despite improvements in FPGA technology that have allowed their use to become attractive for a wider range of applications, inexperience with hardware design remains a barrier for many.

Data Flow between the Mitrion-C and host programs. Each of the Quad-Data Rate (QDR) memories directly available to the Virtex-II Pro contains 4 MB of space for input/output, for a total of 16 MB of input and output.
High-level languages use a variety of approaches to reduce the complexity of hardware design. In this project, Mitrion-C was used because it was readily available at the Naval Research Laboratory, and because it is a commercial product with fast and effective support services. Mitrion-C makes hardware design more accessible in two ways. First, algorithms are described in the Mitrion-C programming language, which uses “C-like” syntax and structures such as functions and loops. Second, the Mitrion Integrated Development Environment (IDE) packages together a user interface, compiler, and simulator.

In hardware design using a traditional hardware description language (HDL) such as Very High Speed Integrated Circuit HDL (VHDL), both simulation and synthesis are time-consuming and synthesis can often fail, requiring modification of the code. The Mitrion IDE simulates and generates VHDL in one step and also estimates whether a design will fit, based on the target hardware’s limitations. Therefore, as long as there are no syntax errors in the Mitrion code, the VHDL synthesis will most likely be successful, with the exception of cases where resource consumption exceeds the resources of the FPGA by a very small margin. One downside of using a high-level language is that the hardware designer loses a level of control. Although Mitrion-C offers explicit options for pipelining, how it achieves its optimizations is opaque to the user.

The simulation of the interaction of a ray of light with an optical element — assuming that the element is a conic surface — requires several calculations. This project looked at two in particular: the intersection point of a ray with an element, and the vector normal to the element’s surface at the point of intersection.

Mitrion-C version 1.4 was used to implement the two calculations. Each of the Quad-Data Rate (QDR) memories directly available to the Virtex-II Pro contains 4 MB of space for input/output, for a total of 16 MB of input and output. Since many scientific applications require more than 16 MB of input and output, a host program is needed to marshall data between the FPGA’s memory and host memory present on the same compute node.

The host program was written using the American National Standards Institute’s standard for C (ANSI-C), and run on one of the Advanced Micro Devices (AMD) Opteron 275 processors on the same compute node as the FPGA. The Cray XD1 supercomputer used in this project uses an interconnect system that allows data transfer between the FPGA and host RAM at a rate of 3.2 GB/s. Mitrion-C uses the full bandwidth provided by Cray.

In the host program, each of the FPGA’s QDR memories is treated as an array. The host program loads values into the arrays, sends the FPGA a start signal using a function provided by Mitrionics, and reads the results after it receives a done signal back from the FPGA.

The Mitrion-C program was split into three functions that: 1) read the inputs from QDR memory, 2) performed floating- point calculations, and 3) wrote the results to a different QDR memory. Data was stored in a list data structure and the program was run in a foreach loop. This combination explicitly instructs the Mitrion compiler to automatically pipeline the design.

As a benchmark, the performance of the Mitrion-C implementations of the ray-intersection calculation and normalvector calculation to ANSI-C programs was compared. Each of the 4 MB memories available to the Virtex-II Pro has a bitwidth of 64 bits. Although all four of the FPGA’s memories were used for input, two of the memories had to be used for output as well. Mitrion-C provides memory synchronization commands that enable bidirectional use of the FPGA’s memories with no effect on throughput.

As mentioned before, the maximum bandwidth of the interconnect, between the FPGA’s QDR memories and the host memories, is 3.2 GB/s. This means that each of the four QDR memories makes up 800 MB/s of that total. Since each FPGA memory can read or write 64 bits (8 bytes) every clock cycle, the 100-MHz clock used by Mitrion makes use of the maximum 800 MB/s bandwidth of the memories.

Measurements confirmed that a throughput very near the limit of the memories — 799.04MB/s in the case of the normal-vector calculation — could be maintained over a large sample of data. Mitrion-C is a straightforward way to achieve the maximum throughput allowed by the memory bandwidth, given that the intended design fits on the target FPGA.

This work was done by Kevin K. Liu, Charles B. Cameron, and Antal A. Sarkady of the US Naval Academy. NRL-0057

This Brief includes a Technical Support Package (TSP).
Document cover
Using High-Level Language to Implement Floating-Point Calculations on FPGAs

(reference NRL-0057) is currently available for download from the TSP library.

Don't have an account? Sign up here.