Audio Processing | Academic Journals and Conferences

Edge-Ready Speech Separation with SuDo-TasNet

This article presents a hybrid speech separation model designed for efficient deployment on edge devices, focusing on optimizing both performance and computational resources. This study proposes a novel hybrid architecture that combines the strengths of Conv-TasNet and SuDoRM- RF models, leveraging their fully-convolutional structures to achieve efficient separation with minimal resource usage. The proposed model has obtained a separation performance of 10.59 db in SI-SDRi for clean Libri2Mix dataset for only 1.17 M parameters with only 0.92 GMACs/s.

Models and Methods for Speech Separation in Digital Systems

The main purpose of the article is to describe state-of-the-art approaches to speech separation and de- monstrate the structures and challenges of building and training such systems. Designing efficient optimized neural network model for speech recognition requires using encoder-decoder model structure with masks estimation flow. The fully-convolutinoal SuDoRM-Rf model demonst- rates the high efficiency with relatively small number of parameters and can be boosted with accelerators, that supports convolutional operations.