Abstract
Toom-Cook multiplication algorithm is one of the most efficient method compared to other traditional large-integer multiplication algorithms. However, in most cases, implementation of this algorithm in hardware is practically avoided due to presence of exact-divisions at intermediate steps of computation. Thus this paper proposes ways to overcome the limitation by designing a hardware-optimized, division-free Toom-4 multiplication algorithm. This is done by eliminating the effects of division due to odd factors in the denominator using a scaling factor, at the interpolation step of the algorithm. Further optimization is done by replacing the direct point-wise multiplication with an optimized Schoolbook multiplication. The overall performance of the proposed architecture is estimated using Area-Time-Product (ATP) and hardware implementation is done using Virtex-7 FPGA device in Xilinx-ISE platform. Practically, the proposed design performs better in terms of speed and resource utilization for higher input bits compared to other state-of-the-art designs; for example for 512 input bits, the percentage ATP of the proposed design is 75.077%, 98.822% and 57.316% superior performance compared to Traditional Schoolbook Multiplication, Karatsuba (Toom-2) multiplication and Toom-4 multiplication respectively.
Data availability
Data are available and can be provided under reasonable request.
Code availability
Algorithm related to the code is provided. However, the code can also be made available under reasonable request.
References
F. Antognazza, A. Barenghi, G. Pelosi, R. Susella, Performance and efficiency exploration of hardware polynomial multipliers for post-quantum lattice-based cryptosystems. SN Comput. Sci. 5, 212 (2024)
H. Becker, J. Bermudo Mera, A. Karmakar, J. Yiu, I. Verbauwhede, Polynomial multiplication on embedded vector architectures. IACR Transact. Cryptogr. Hardw. Embed. Syst. 2022, 482–505 (2021)
G. Bilardi, L. De Stefani, The I/O complexity of toom-cook integer multiplication. Proceedings Of The Thirtieth Annual ACM-SIAM Symposium On Discrete Algorithms. pp. 2034-2052 (2019)
M. Das, B. Jajodia, Hardware design of optimized large integer schoolbook polynomial multiplications on FPGA. 2022 IEEE 19th Int. SoC Design Conf. (ISOCC). pp. 1-2 (2022)
J. Ding, S. Li, Z. Gu, High-speed ECC processor over NIST prime fields applied with toom-cook multiplication. IEEE Trans. Circuits Syst. I Reg. Papers. 66, 1003–1016 (2019)
M. Elia, Loss of Precision in Implementations of the Toom-Cook Algorithm. Graduate College Dissertations And Theses. (2021)
Francisco Pajuelo-Holguera, José M. Granado-Criado, Juan A. Gómez-Pulido, Fast montgomery modular multiplier using FPGAs. IEEE Embedded Sys. Lett. 14, 19–22 (2022)
T. Fritzmann, J. Sepúlveda, Efficient and Flexible Low-Power NTT for Lattice-Based Cryptography. 2019 IEEE Int. Symp. On Hardware Oriented Security And Trust (HOST). pp. 141-150 (2019)
Z. Gu, S. Li, A division-free Toom-Cook multiplication-based montgomery modular multiplication. IEEE Trans. Circuits Syst. II Exp. Briefs. 66, 1401–1405 (2019)
M. Imran, Z. Abideen, S. Pagliarini, An Open-source Library of Large Integer Polynomial Multipliers. 2021 24th International Symposium On Design And Diagnostics Of Electronic Circuits & Systems (DDECS). pp. 145-150 (2021)
M. Imran, Z. Abideen, S. Pagliarini, A versatile and flexible multiplier generator for large integer polynomials. J. Hardw. Syst. Secur. 7, 55 (2023)
A. Jose Maria Bermudo Mera, I. Verbauwhede, Time-memory trade-off in Toom-Cook multiplication: an application to module-lattice-based cryptography. Cryptology EPrint Archive, Paper 2020/268. (2020)
R. Kadu, D. Adane, Hardware Implementation of Elliptic Curve Cryptosystem Using Optimized Scalar Multiplication (Smart Trends In Computing And Communications, Springer, 2020), pp.307–316
A. Karatsuba, Y. Ofman, Multiplication of multidigit numbers on automata. Soviet Phys. Doklady. 7, 595–596 (1963)
M. Kronenburg, Toom-Cook Multiplication: Some Theoretical and Practical Aspects. arXiv:1602.02740 (2016)
C. Lee, P. Meher, and W. Lee, Subquadratic space complexity digit-serial multiplier over binary extension fields using Toom-Cook algorithm. 2014 Int. Symp. On Integrated Circuits (ISIC). pp. 176-179 (2014)
R. Liu, S. Li, A low area ASIC implementation of 272 bit multiplier. 2017 International Conference On Electron Devices And Solid-State Circuits (EDSSC). pp. 1-2 (2017)
P. Montgomery, Modular multiplication without trial division. Math. Comput. 44, 519–521 (1985)
NIST. (2020). NIST Post-Quantum Cryptography Standardization: Round 3 Submissions. Kernel Description. Available at: https://csrc.nist.gov/Projects/Post-Quantum-Cryptography/Round-3-Submissions
D. Putranto, R. Wardhani, H. Larasati, H. Kim, Space and time-efficient quantum multiplier in post quantum cryptography era. IEEE Access 11, 21848–21862 (2023)
A. Qadir, N. Varol, A Review Paper on Cryptography. 2019 7th International Symposium On Digital Forensics And Security (ISDFS). pp. 1-6 (2019)
C. Rafferty, M. O’Neill, N. Hanley, Evaluation of large integer multiplication methods on hardware. IEEE Transact. Comput. 66, 1369–1382 (2017)
B. Rashidi, Throughput/area efficient implementation of scalable polynomial basis multiplication. J. Hardw. Syst. Secur. 4, 120 (2020)
A. Toom, The complexity of a scheme of functional elements simulating the multiplication of integers. Soviet Math. Doklady. 7, 714–716 (1963)
Q. Tian, Y. Wang, G. Liu, X. Liu, J. Diao, H. Xu, Hardware-Efficient Parallel FIR Filter Structure Based on Modified Cook-Toom Algorithm. 2018 IEEE Asia Pacific Conf. On Circuits And Syst. (APCCAS). pp. 342-345 (2018)
J. Wang, C. Yang, F. Zhang, Y. Meng, Y. Su, TCPM a reconfigurable and efficient toom-cook-based polynomial multiplier over rings using a novel compressed postprocessing algorithm. IEEE Transact. Very Large Scale Integr. (VLSI) Syst. 31, 1153–1166 (2023)
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
There is no Conflict of interest among the authors.
Ethics approval and consent to participate
The submitted work is original and not have been published elsewhere in any form or language.The manuscript will not be submitted elsewhere until the editorial process is completed.
Consent for publication
The manuscript in part or in full has not been submitted or published anywhere. In other words, the authors should ensure that the manuscript is not a duplicate publication.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Das, M., Jajodia, B. ATP-Optimized Implementation of Four-Way Toom-Cook Multiplications on FPGAs for Large Integer Arithmetic. Circuits Syst Signal Process (2025). https://doi.org/10.1007/s00034-024-02978-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00034-024-02978-7