Comparative analysis of the specialized software and hardware for deep learning algorithms

Yuriy Khoma; A. Bench

The automated translation, speech recognition and synthesis, object detection as well as emotion recognition are well known complex tasks that modern smartphone can solve. It became possible with intensive usage of algorithms of Artificial Intelligence and Machine Learning. Most popular now are implementations of deep neural networks and deep learning algorithms. Such algorithms are widely used in all verticals and need hardware accelerators as well as deep cooperation between both software and hardware parts. The mentioned task became very actual during embedding of cloud-based algorithms into systems with limited computing capabilities, small physical size, and extremely low power consumption. The aim of this paper is to compare existing software and hardware solutions dedicated to the development of artificial neural networks and deep learning applications. The paper is focused on three topics related to deep learning software frameworks, specialized GPU-based hardware, and prospects of deep learning acceleration using FPGA. The most popular software frameworks, such as Caffe, Theano, Torch, MXNet, Tensorflow, Neon, CNTK have been compared and analyzed in the paper. Advantages of GPU solutions based on CUDA and cuDNN frameworks have been described. Prospects of FPGA as high-speed and power-efficient solutions for deep learning algorithm design, especially in terms of combination with OpenCL language have been discussed in the paper.

artificial intelligence

deep learning algorithms

artificial neural networks

software solutions

[1] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag Berlin, Heidelberg, 2006.

[2] Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning, The MIT Press, 2016.

[3] L.Deng and D. Yu. Deep Learning: Methods and Applications. Foundations and Trends in Signal Processing, 2013, vol. 7, nos. 3-4, pp. 197-387. https://doi.org/10.1561/2000000039

[4] Mostapha Zbakh, Mohammed Essaaidi, Pierre Manneback, Chunming Rong, Cloud Computing and Big Data: Technologies, Applications and Security, Springer International Publishing, 2019. https://doi.org/10.1007/978-3-319-97719-5

[5] Gerassimos Barlas, Multicore and GPU Programming: An Integrated Approach, Morgan Kaufmann Publishers Inc., San Francisco, CA, 2014. https://doi.org/10.1016/B978-0-12-417137-4.00006-X

[6] Seonwoo Min, Byunghan Lee, Sungroh Yoon; Deep learning in bioinformatics, Briefings in Bioinformatics, Volume 18, Issue 5, 1 September 2017, pp. 851-869.

[7] NVIDIA GPU Computing. https://www.nvidia.com/object/doc_gpu_compute.html

[8] CUDA Toolkit Documentation. https://docs.nvidia.com/cuda/

[9] cuDNN Developer Guide. https://docs.nvidia.com/deeplearning/ sdk/cudnn-developer-guide/index.html

[10] Amazon EC2 F1 Instances. https://aws.amazon.com/ec2/ instance-types/f1/

[11] Cloud TPU documentation. https://cloud.google.com/tpu/docs/

[12] Accelerating DNNs with Xilinx Alveo Accelerator Cards. https://www.xilinx.com/support/documentation/ white_papers/wp504-accel-dnns.pdf

[13] An OpenCLTM Deep Learning Accelerator on Arria 10. https://arxiv.org/pdf/1701.03534.pdf.