Sharing multi-cycle hardware blocks like the DSP48E1 primitive in Xilinx FPGAs can result in significant resource savings, but complicates scheduling. For high-throughput, DSP blocks must be pipelined, which results in a high initiation interval (II) for resource shared implementations. In this paper, we propose a resource reduction technique that minimises DSP block usage while also offering improved II over traditional approaches. This is integrated in a high-level tool which takes datapath descriptions in C and generates synthesisable Verilog RTL with different levels of resource sharing. We demonstrate significantly improved throughput compared to traditional resource sharing while achieving resource reduction compared to resource unconstrained and HLS implementations. The approach explores an otherwise infeasible design space between resource unconstrained and traditional resource sharing methods.
|Original language||English (US)|
|Title of host publication||FPL 2016 - 26th International Conference on Field-Programmable Logic and Applications|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|State||Published - Sep 26 2016|