Synthesis of Parallel-Pipeline Devices for Vertical Computation of Basic Multi-Operand Neural Operations

2025;
: pp. 190 - 208
1
Lviv Polytechnic National University, Lviv, Ukraine
2
Lviv Polytechnic National University, Lviv, Ukraine

This paper presents a study of hardware implementation of basic multi-operand neural ope- rations for artificial neural networks. The operational basis of neural networks is identified, which includes groups of preprocessing operations, processor operations, and transfer function computations. The selection of basic multi-operand neural operations is justified: finding extreme values in one- dimensional arrays, calculating the sum of squared differences, scalar product computation, and group summation. Vertical computation methods for the specified operations have been improved through simultaneous processing of bit slices of all operands and adaptive complexity changes of operations in pipeline stages. This ensures synchronization of data arrival time with computation time and high equipment utilization efficiency. An integrated approach to developing parallel-pipeline devices is proposed, which is based on the capabilities of modern element base and considers the requirements of specific applications. The principles of developing parallel-pipeline devices for vertical-group compu- tation have been defined: using the basis of elementary arithmetic operations, modularity, pipelining, spatial parallelism, structural uniformity, timing parameter coordination, and specialization for specific tasks. A serial-parallel data format converter has been developed for transforming the stream of word- sequential input data into a parallel one-dimensional data array. Basic parallel-pipeline structures have been created that implement vertical computation algorithms in hardware. The synthesis method for parallel-pipeline devices has been improved using mechanisms for coordinating pipeline cycle duration with data arrival time. It is shown that the application of the developed methods and structures ensures real-time data processing with high equipment utilization efficiency.

  1. Albert Chun-Chen Liu, & Ming, O. (2021). Artificial Intelligence Hardware Design. John Wiley & Sons.
  2. Baranovsky S, Bilokobylskyi O, Ye, B., Bomba A, Dovbysh A, Zhokhin A, Yeroshenko T, Kazymyr  V, Klymenko M, Kovalevskyy S, Kozlov O, Yu, K., Kupin A, Lande D, Malyarets L, Mincer O, Pankratova N, Pysarenko V, Ramazanov S, & Roskladka A. (2023). Strategy for Artificial Intelligence Development in Ukraine. https://doi.org/10.15407/development_strategy_2023
  3. Das, A., & Serrano-Gotarredona, T. (2024). Spike-based learning application for neuromorphic engineering.Frontiers Media SA.
  4. Davies, M., Srinivasa, N., Lin, T.-H., Chinya, G., Cao, Y., Choday, S. H., Dimou, G., Joshi, P., Imam, N., Jain, S.,Liao, Y., Lin, C.-K., Lines, A., Liu, R., Mathaikutty, D., McCoy, S., Paul, A., Tse, J., Venkataramanan, G.,& Weng, Y.-H. (2018). Loihi: A Neuromorphic Manycore Processor with On-Chip Learning. IEEE Micro, 38(1), 82–99. https://doi.org/10.1109/mm.2018.112130359
  5. Gjorgjievska Perusheska, M., Dimitrova, V., Popovska-Mitrovikj, A., & Andonov, S. (2021). Application of Machine Learning in Cryptanalysis Concerning Algorithms from Symmetric Cryptography. Lecture Notes in Networks and Systems, 885–903. https://doi.org/10.1007/978-3-030-80129-8_59
  6. Guo, K., Zeng, S., Yu, J., Wang, Y., & Yang, H. (2018, December 6). A Survey of FPGA-Based Neural Network Accelerator. ArXiv.org. https://doi.org/10.48550/arXiv.1712.08934
  7. Hager, G., & Wellein, G. (2010). Introduction to high performance computing for scientists and engineers. CRC Press.
  8. Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., & Alsaadi, F. E. (2017). A survey of deep neural network architectures and their applications. Neurocomputing, 234, 11–26. https://doi.org/10.1016/j.neucom. 2016.12.038
  9. Muhammad Sarg, Khalil, A. H., & Mostafa, H. (2021). Efficient HLS Implementation for Convolutional Neural Networks Accelerator on an SoC. 1–4. https://doi.org/10.1109/icm52667.2021.9664920
  10. Nouacer, R., Hussein, M., Espinoza, H., Ouhammou, Y., Ladeira, M., & Castiñeira, R. (2020). Towards a framework of key technologies for drones. Microprocessors and Microsystems, 77, 103142. https://doi.org/10.1016/j.micpro.2020.103142
  11. Olaoye, G. (2025). Self-Learning Neural Networks in the Cloud: Towards Autonomous AI Systems. https://doi.org/10.2139/ssrn.5129553
  12. Panda, A. K., Palisetty, R., & Ray, K. C. (2020). High-Speed Area-Efficient VLSI Architecture of Three-Operand Binary Adder. IEEE Transactions on Circuits and Systems I: Regular Papers, 67(11), 3944–3953. https://doi.org/10.1109/tcsi.2020.3016275
  13. Preethi, A. P. (2021). Secure Data Communication Using Diffie Rivest Cryptography Algorithm. Neural, Parallel, and Scientific Computations, 29(2). https://doi.org/10.46719/npsc20212923
  14. Saini, S., Lata, K., & Sinha, G. R. (2021). VLSI and Hardware Implementations using Modern Machine Learning Methods. CRC Press.
  15. Sze, V., Chen, Y.-H., Emer, J., Suleiman, A., & Zhang, Z. (2017, April 1). Hardware for machine learning: Challenges and opportunities. IEEE Xplore. https://doi.org/10.1109/CICC.2017.7993626
  16. Tsmots, I. G., Tesliuk, V. M., & Opotyak, Y. V. (2024). Device for determining maximum and minimum numbers in a two-dimensional array of numbers: Patent for invention No. 128150 Ukraine. Published 14.04.2024, Bulletin No. 16.
  17. Tsmots, I. G., Tesliuk, V. M., Lukashchuk, Yu. A., & Kazimyra, I. Ya. (2023). Device for calculating the scalar product: Patent for invention No. 127774 Ukraine. IPC G06G 6/33, Application No. u202010852; Filed 19.05.2020; Published 27.12.2023, Bulletin No. 52.
  18. Tsmots, I. H., & Antoniv, V. A. (2022). METHODS AND TOOLS FOR VERTICAL-PARALLEL SEARCHING OF  MAXIMUM  AND  MINIMUM  NUMBERS  IN  ARRAYS.  Ukrainian  Journal  of  Information Technology, 4(1), 68–77. https://doi.org/10.23939/ujit2022.01.068
  19. Tsmots, I. H., Lukashchuk, Yu. A., Ihnatyev, I. V., & Kazymyra, I. Ya. (2021). COMPONENTS OF HARDWARE   NEURAL   NETWORKS   FOR   COORDINATED   PARALLEL-VERTICAL   DATA PROCESSING IN REAL TIME. Ukrainian Journal of Information Technology, 3(1), 63–72. https://doi. org/10.23939/ujit2021.03.063
  20. Tsmots, I. H., Opotyak, Y. V., Shtohrinets, B. V., Mamchur, T. B., & Holubets, V. M. (2024). Model, structure, and synthesis method of a matrix-type neural element. Scientific Bulletin of UNFU, 34(4), 68–77. https://doi.org/10.36930/40340409
  21. Tsmots, I., Skorokhoda, O., Ignatyev, I., & Rabyk, V. (2017). Basic vertical-parallel real time neural network components. 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT), 344–347. https://doi.org/10.1109/stc-csit.2017.8098801
  22. Usha, S., & Kanthimathi, M. (2024). Design and Comparison of 24-bit Three Operand Adders using Parallel Prefix method for Efficient Computations. ICST Transactions on Scalable Information Systems. https://doi.org/10.4108/eetsis.5004
  23. Wang, G., & Fu, D. (2024). Spike Neural Network with Delayed Propagation Characteristics and Hardware Implementation. 1181–1185. https://doi.org/10.1109/eei63073.2024.10696338
  24. Wu, J., Zhao, B., Wen, H., & Zhao, Q. (2022). Design of Neural Network Accelerator Based on In-Memory Computing Theory. 2022 4th International Conference on Natural Language Processing (ICNLP), 547– 551. https://doi.org/10.1109/icnlp55136.2022.00100
  25. Xian, A., Martini, B., & Culurciello, E. (2015). Recurrent Neural Networks Hardware Implementation on FPGA. ArXiv (Cornell University). https://doi.org/10.48550/arxiv.1511.05552
  26. Zhang, C., Patras, P., & Haddadi, H. (2019). Deep Learning in Mobile and Wireless Networking: A Survey. IEEE Communications Surveys & Tutorials, 21(3), 2224–2287. https://doi.org/10.1109/comst.2019.2904897