Montgomery multiplication normally spends over 90% of its execution time in inner loops executing some kind of multiply-and-add operations. The performance of these critical code sections can be greatly improved by customizing the processor's instruction set for low-level arithmetic functions. In this paper, we investigate the potential of architectural enhancements for multiple-precision Montgomery multiplication according to the so-called Finely Integrated Product Scanning (FIPS) method. We present instruction set extensions to accelerate the FIPS inner loop operation based on the availability of a multiply/accumulate (MAC) unit with a wide accumulator. Finally, we estimate the execution time of a 1024-bit Montgomery multiplication on an extended MIPS32 core and discuss the impact of the multiplier latency. © Springer-Verlag Berlin Heidelberg 2003.
CITATION STYLE
Großschädl, J., & Kamendje, G. A. (2003). Architectural enhancements for montgomery multiplication on embedded RISC processors. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2846, 418–434. https://doi.org/10.1007/978-3-540-45203-4_32
Mendeley helps you to discover research relevant for your work.