Memory hierarchy latency is one of the main problems that prevents processors from achieving high performance. To eliminate the need of loading/storing large sets of data, Resistive Associative Processors (ReAP) have been proposed as a solution to the von Neumann bottleneck. In ReAPs, logic and memory structures are combined together to allow inmemory computations. In this paper, we propose a new algorithm to compute the matrix multiplication inside the memory that exploits the benefits of ReAP. The proposed approach is based on the Cannon algorithm and uses a series of rotations without duplicating the data. It runs in O(n), where n is the dimension of the matrix. The method also applies to a large set of row by column matrix-based applications. Experimental results show several orders of magnitude increase in performance and reduction in energy and area when compared to the latest FPGA and CPU implementations.
|Original language||English (US)|
|Title of host publication||Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|State||Published - Apr 19 2018|