speaker identification

ВПЛИВ ТРИВАЛОСТІ АУДІОСИГНАЛІВ НА ТОЧНІСТЬ ІДЕНТИФІКАЦІЇГОЛОСІВ МОВЦІВ

This paper investigates the capability of a system based on voice embeddings to identify speakers. We use a set of audio recordings from five speakers and construct clips of varying durations – 5 to 600 seconds. Pyannote-audio embeddings are extracted by a neural network, after which similarity coefficients are computed between embeddings of clips from the same speaker (intra-speaker similarity) and from different speakers (inter-speaker dissimilarity).

IoT system for real-time audio information processing

This paper presents the development and inves- tigation of a speech-to-text conversion and speaker identi- fication system based on a Raspberry Pi microcomputer, designed for local audio data processing in environments with limited network connectivity. The system integrates Silero and WebRTC models for voice activity detection, SpeechBrain for speaker identification, and the Whisper family of models for speech recognition.