diarization

Utilization of Voice Embeddings in Integrated Systems for Speaker Diarization and Malicious Actor Detection

This paper explores the use of diarization systems which employ advanced machine learning algorithms for the precise detection and separation of different speakers in audio recordings for the implementation of an intruder detection system. Several state-of-the-art diarization models including Nvidia’s NeMo, Pyannote and SpeechBrain are compared. The performance of these models is evaluated using typical metrics used for the diarization systems, such as diarization error rate (DER) and Jaccard error rate (JER).