A high throughput FPGA-Based floating point conjugate gradient implementation

22Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

As Field Programmable Gate Arrays (FPGAs) have reached capacities beyond millions of equivalent gates, it becomes possible to accelerate floating-point scientific computing applications. One type of calculation that is commonplace in scientific computation is the solution of systems of linear equations. A method that has proven in software to be very efficient and robust for finding such solutions is the Conjugate Gradient algorithm. In this paper we present a parallel hardware Conjugate Gradient implementation. The implementation is particularly suited for accelerating multiple small to medium sized dense systems of linear equations. Through parallelization it is possible to convert the computation time per iteration for an order n matrix from Θ(n 2) cycles for a software implementation to Θ(n). I/O requirements are scalable and converge to a constant value with the increase of matrix order. Results on a VirtexII-6000 demonstrate sustained performance of 5 GFLOPS and projected results on a Virtex5-330 indicate sustained performance of 35 GFLOPS. The former result is comparable to high-end CPUs, whereas the latter represents a significant speedup. © 2008 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Lopes, A. R., & Constantinides, G. A. (2008). A high throughput FPGA-Based floating point conjugate gradient implementation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4943 LNCS, pp. 75–86). https://doi.org/10.1007/978-3-540-78610-8_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free