VPX-SC N-dimensional Supercomputing Architectures Come To The Critical Embedded Systems Market
From the first computers of the 1940s through the machines of the 1990s, all computer systems were CPUbound. In other words, the I/O interfaces could deliver more data than the CPU could process. In the 1990s Moore’s Law took over and clock speeds doubled every 18 months, along with the addition of multi-core processors. So, from 1990 through today, we have been I/O-bound, meaning CPUs can now process more data than the I/O links can deliver. Increases in CPU performance have been revolutionary while the increases in interconnect bandwidth have been incremental for many decades. However, bandwidth increases in RapidIO, InfiniBand, and Ethernet are breaking this bottleneck, giving us the ability to design incredibly powerful embedded supercomputing architectures for today’s dataintensive applications.
From PCI to Infiniband
To date, InfiniBand has been used to hook together hundreds or thousands of processors to build clustered Linux servers. So why, after all these years, has Intel finally taken an interest in high-speed supercomputing interconnects like InfiniBand? In two words: Cloud Computing. And, what is the primary application in the Cloud? It’s data mining. Google, Facebook, Linked In, Amazon, Yahoo...they are all drooling over the prospects of data mining. But, to build these advanced machines will require entirely new architectures that break the chains that bound us in the past.
And yes, there are many applications for supercomputing architectures in em bedded applications, especially in the military. These applications range from ad vanced radar systems, to sonar, signal intelligence (SIGINT), communications intelligence (COMINT), systems that run SWARM algorithms (for squadrons of UAVs and UUVs), and in electronic warfare (EW) systems. In addition, data mining is another arena where supercomputers can be used in military intelligence data gathering and analysis.
The cloud-based computing machines will be highly commoditized boxes made in China, so that market is not very attractive. But, the components used in those commodity-oriented cloud machines (InfiniBand chips and the advanced CPUs) can be used, under the influence of intelligent thinking, to build extremely powerful embedded supercomputers.
I/O and Architecture
So, we moved to high-speed serial differential signals for the I/O early in the 2000’s. When you start trying to calculate the bandwidths of these serial links, you run into serious esoteric laborious technical arguments about frequency, bit rate, baud rate, and true speeds. Rather than do calculations here, and get sucked into that morass, take a look at this table (http://en.wikipedia.org/wiki/List_of_device_bit_rates) and draw your own conclusions. Let’s just say we are running serial links at over 200 MB/s today, and that speed is doubling about every 3-4 years. Now, we have the ability to break the I/O-bound problems in computer architectures.
Next, we have to look at how we can hook processors together with these links efficiently. All computer architectures of the past (and most in the present) are crude 2-dimensional architectures. Even the switched-serial and point-to-point architectures (stars, meshes, etc.) are 2-D. The first 3-D architecture you can build is a cube. The total number of nodes in an N-dimensional architecture is (2n), where n is the number of dimensions in the architecture. For a cube, that’s 8. The number of links on each node is equal to the number of dimensions (n) of the architecture, or 3 for a cube. The total number of links in the system is [n × 2(n-1)], or 12 for the cube. And most importantly, how many hops (i.e., how many nodes must the data pass through) before the data arrives at its destination in the worst case? For all N-dimensional architectures, that’s the same as the number of dimensions, (n). For the 3-D cube, that means 3 hops in the worst case.
In the past, there have been some aberrant architectures used to build multiprocessor systems. Take a ring, for example. The problem with a ring is that the worst-case number of hops is (n-1), where (n) is the number of nodes. So, the bigger the ring, the greater the latency. Additionally, if you break a ring at any place, the whole machine dies. So, designers connected two counter-rotating rings to each node, in case one ring failed. That requires 4 links per node and the maximum number of hops is the number of nodes divided by 2 (n/2). So, the bigger the ring, the more latency you introduce here too. DEC took this one step further with the Torus architecture (using PDP-11 minicomputers). A torus consists of counter-rotating rings at right angles to other counter-rotating rings. Here again, the maximum number of hops is (n/2), but the number of links per node goes up to 6, not a good trade-off. Along the way, there were trees, fat trees, and variations on the theme of rings (http://pg-server.csc.ncsu.edu/mediawiki/index.php/PG_MediaWiki:Community_Portal). All these techniques fall apart when you get above 8 processors.
So, to overcome the peculiar inefficiencies in these deviant 2-D and 3-D structures, we must increase the number of dimensions in the architecture. The first 4-dimensional architecture we can build is a hypercube (Figure 1). The number of nodes or processors (2n) is 16. The number of links per node is the number of dimensions (n), or 4. The total number of links in the system [n × 2(n-1)] is 32. The maximum number of hops (n) is 4 (Figure 2). Take this to a 6- dimensional hypercube (Figure 3) and you get 64 processor nodes, 6 links per node, 192 total links in the system, and the maximum number of hops is 6.
Wire-up these 4 and 6-dimensional architectures with optical links, using GPGPUs (General Purpose Graphic Processing Units) as the processors, and we can build extraordinarily powerful supercomputers in a small box. These machines will certainly appeal to many of the I/O-bound, algorithmdriven, data-intensive applications we have today.
This article was written by Ray Alderman, Executive Director, VITA (Fountain Hills, AZ). For more information, contact Mr. Alderman at
Top Stories
INSIDERElectronics & Computers
Microsoft, PsiQuantum Designing Quantum Computer Prototypes for DARPA US2QC...
INSIDERCommunications
Aitech’s New Palm-Sized Satellite Enables Space-Based AI Processing
Technology ReportMaterials
Lighter, Recyclable Body Seal from Cooper Standard Wins SAA Award
INSIDERPower
Two Startups Partner to Expand Hydrogen-Powered Drone Production
NewsSoftware
Artificial Intelligence Being Schooled for Mining Applications
INSIDERAerospace
Supersonic X-59 Completes Cruise Control Engine Speed Test Ahead of First Flight
Webcasts
Power
2025 Battery & Electrification Summit
Defense
A Fork in the Road: The Potential of Debian Linux for...
Materials
Optimizing Electric Powertrains: Advanced Materials for...
Imaging
Breakthrough in Infrared and Visible Imaging: One Dataset with...
Aerospace
Improving Rocket and Flight Vehicle Testing Under Capital...