We present a scalable design for accelerating the problem of solving a dense linear system of equations using LU Decomposition. A novel systolic array architecture that can be used as a building block in scientific applications is described and prototyped on a Xilinx Virtex 6 FPGA. This solver has a throughput of around 3.2 million linear systems per second for matrices of size N=4 and around 80 thousand linear systems per second for matrices of size N=16. In comparison with similar work, our design offers up to a 12-fold improvement in speed whilst requiring up to 50% less hardware resources. As a result, a linear system of size N=64 can be implemented on a single FPGA, whereas previous work was limited to a size of N=12 and resorted to complex multi-FPGA architectures to scale. Finally, the scalable design can be adapted to different sized problems with minimum effort. © 2014 IEEE.
|Original language||English (US)|
|Title of host publication||Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||2|
|State||Published - Jan 1 2014|