Method of Synthesis of Devices for Parallel Stream Calculation of Scalar Product in Real Time

I. G. Tsmots; Yu. V. Opotyak; Bohdan Shtohrinets

A graph scheme of a generalized algorithm for parallel stream calculation of the scalar product was developed. The proposed algorithm uses the same type of operations for forming a partial product that is calculated starting from the lowest digits of the multipliers. The developed algorithm of parallel stream calculation of the scalar product is performed with the use of operations for forming partial products, calculating the macro-partial product, and adding it to the partial result shifted to the right by the number of digits that were used in the formation of partial products. It is suggested that the development of FPGA structures of devices for parallel stream calculation of the scalar product be carried out according to the following principles: use of the same type of conveyor steps; performing calculations based on addition, inversion, and shift operations; performing the calculation of the scalar product as a single operation; regularity and localization of connections between conveyor steps; coordination of the duration of the conveyor time with the time of data input and the time of output of calculation results; space-time parallelization of the process of calculating the scalar product. The algorithm and structure of the parallel stream device for calculating the scalar product with direct formation of partial products based on the analysis of one order of multipliers, which ensures operation with the smallest conveyor cycle, has been developed. The algorithm and structure of the parallel stream device for calculating the scalar product with the formation of partial products for the sum of two pairs of products with the analysis of one order of multipliers, which is advisable to use for a small number of operands, have been developed. The algorithm and structure of a parallel stream device for calculating the scalar product with the formation of partial products according to the modified Booth algorithm have been developed, which ensures a reduction in equipment costs when processing operands with n≥24 bits. The algorithm and structure of the device for calculating the scalar product with the formation of group partial products have been developed, which provides the lowest equipment costs in the case of n=8 for N>8. A method for the synthesis of FPGA devices for parallel stream calculation of the scalar product in real-time has been developed. The proposed method ensures high efficiency of the use of the equipment due to the selection of the algorithm for the formation of partial products and the structure of the device from the list of developed ones and the coordination of the cycle of the conveyor of the selected structure with the time of arrival of input data.

space-time parallelization

graph scheme of the generalized algorithm

equipment costs

coordination of the conveyor cycle

Sogi, N., Souza, L. S., Gatto, B. B., Fukui, K. (2020). Metric Learning with A-based Scalar Product for Image- set Recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA. DOI: 10.1109/CVPRW50498.2020.00433.
Ludeno, G. (2018). Normalized Scalar Product Approach for Nearshore Bathymetric Estimation From X- Band Radar Images: An Assessment Based on Simulated and Measured Data. IEEE Journal of Oceanic Engineering, Vol. 43, No. 1, 221–237. DOI: 10.1109/JOE.2017.2758118.
Hong S., Lee I., Park Y. (2018). Optimizing a FPGA-based neural accelerator for small IoT devices. In 2018 International Conference on Electronics, Information, and Communication (ICEIC), Honolulu, HI, USA. DOI: 10.23919/ELINFOCOM.2018.8330546.
Tsmots, I., Rabyk, V., Teslyuk, V., Opotyak, Yu. (2023). Floating-Point Number Scalar Product Hardware Implementation for Embedded Systems. In 17th International Conference on the Experience of Designing and Application of CAD Systems (CADSM), Jaroslaw, Poland. DOI: 10.1109/CADSM58174.2023.10076502.
Drozd, J., Drozd, O., Nikul, V., Sulima, J. (2018). FPGA implementation of vertical addition with a bitwise pipeline of calculations. In 2018 IEEE 9th International Conference on Dependable Systems, Services and Technologies (DESSERT), Kyiv, Ukraine. DOI: 10.1109/DESSERT.2018.8409136.
Zhang, W., Zhang, C., Niu, L., Din, F. U., Farrukh, Jiang, H. (2022). An Efficient FPGA Design for Fixed- point Exponential Calculation. In IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA), Xi'an, China. DOI: 10.1109/ICTA56932.2022.9963050.
Tsmots, I. (2005). Information technologies and specialized tools for processing signals and images in real time. Lviv: UAP.
Rashkevych, Yu. M., Tkachenko, R. O., Tsmots, I. H., Peleshko, D. D. (2014). Neuro-like methods, algorithms and structures of real-time signal and image processing. Lviv Polytechnic Publishing House.
Tsmots, I. H., Tkachenko, R. O., Teslyuk, V. M., Riznyk, O. Ya., Kazymira, I. Ya. (2022). Smart systems: technologies, architectures, data processing, protection and coding. Lviv: SPOLOM.
Zong, P., Wang, Y., Xie, F. (2018). Embedded Software Fault Prediction Based on Back Propagation Neural Network. In IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), Lisbon, Portugal. DOI: 10.1109/QRS-C.2018.00098.
Kalichanin-Balich, I., Lopez-Martin, C. (2010). Applying a Feedforward Neural Network for Predicting Software Development Effort of Short-Scale Projects. In Eighth ACIS International Conference on Software Engineering Research, Management and Applications, Montreal, QC, Canada. DOI: 10.1109/SERA.2010.41.
Tsmots, I., Skorokhoda, O., Rabyk, V. (2018). Parallel algorithms and matrix structures for scalar product calculation. In 14th International Conference on Advanced Trends in Radioelecrtronics, Telecommunications and Computer Engineering (TCSET), Lviv-Slavske, Ukraine. DOI: 10.1109/TCSET.2018.8336347.
Nguyen, D. T., Nguyen, T. N., Kim, H., Lee, H. -J.. (2019). A High-Throughput and Power-Efficient FPGA Implementation of YOLO CNN for Object Detection. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 27, No. 8, 1861–1873. DOI: 10.1109/TVLSI.2019.2905242.
Chan, D. (2023). The Next Frontier: From SoC to Heterogenous Integration of Chiplets. In International VLSI Symposium on Technology, Systems and Applications (VLSI-TSA/VLSI-DAT), HsinChu, Taiwan, 2023. DOI: 10.1109/VLSI-TSA/VLSI-DAT57221.2023.10134113..
Liang, L. Lu, Y., Xiao, Q., Yan, S. (2017). Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs. In IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), Napa, CA, USA. DOI: 10.1109/FCCM.2017.64.
Rekha, R., Menon, K. P. (2018). FPGA implementation of exponential function using cordic IP core for extended input range. In 3rd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India. DOI: 10.1109/RTEICT42901.2018.9012611.
Pandey, J. G., Gurawa, A., Nehra, H., Karmakar, A. (2016). An efficient VLSI architecture for data encryption standard and its FPGA implementation. In 2016 International Conference on VLSI Systems, Architectures, Technology and Applications (VLSI-SATA), Bengaluru, India. DOI: 10.1109/VLSI-SATA.2016.7593054.
Shrestha, R. (2017). High-speed and low-power VLSI-architecture for inexact speculative adder. In 2017 International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Hsinchu, Taiwan. DOI: 10.1109/VLSI- DAT.2017.7939644.
Yu, Hao. (2017). Energy efficient VLSI circuits for machine learning on-chip. In International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Hsinchu, Taiwan. DOI: 10.1109/VLSI-DAT.2017.7939671.
Nguyen, D. T., Kim, H., Lee, H.-J., Chang, I.-J. (2018). An Approximate Memory Architecture for a Reduction of Refresh Power Consumption in Deep Learning Applications. In IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy. DOI: 10.1109/ISCAS.2018.8351021
Tsmots, I. H., Skorohoda, O. V. (2011). Device for calculating the scalar product. Ukrainian patent for a utility model, No. 66138, Bulletin 24.
Tsmots, I. H., Skorokhoda, O. V., Teslyuk, V. M. Device for calculating the scalar product. Patent of Ukraine for the invention, No. 101922, 13.05.2013, Bulletin No. 9.
Tsmots, I. H., Skorohoda, O. V., Medykovskyi, M. O. Device for calculating the scalar product. Patent of Ukraine for the invention, No. 118596, 11.02.2019, Bulletin No. 3.
Tsmots, I., Rabyk, V., Kryvinska, N., Yatsymirskyy, M., Teslyuk, V. (2022). Design of the Processors for Fast Cosine and Sine Fourier Transforms. Circuits. Systems, and Signal Processing, 41(9), 4928–4951.