Speaker Identification and Verification Using Line Spectral Frequencies
State-of-the-art speaker identification and verification (SIV) systems provide near perfect performance under clean conditions. However, their performance deteriorates in the presence of background noise. Many feature compensation, model compensation and signal enhancement techniques have been proposed to improve the noise-robustness of SIV systems. Most of these techniques require extensive training, are computationally expensive or make assumptions about the noise characteristics. There has not been much focus on analyzing the relative importance, or speaker-discriminative power of different speech zones, particularly under noisy conditions.
In this work, an automatic, text-independent speaker identification (SI) system and speaker verification (SV) system is proposed using Line Spectral Frequency (LSF) features. The performance of the proposed SI and SV systems are evaluated under various types of background noise. A score-level fusion based technique is implemented to extract complementary information from static and dynamic LSF features. The proposed score-level fusion based SI and SV systems are found to be more robust under noisy conditions.
In addition, we investigate the speaker-discriminative power of different speech zones such as vowels, non-vowels and transitions. Rapidly varying regions of speech such as consonant-vowel transitions are found to be most speaker-discriminative in high SNR conditions. Steady, high-energy vowel regions are robust against noise and are hence most speaker-discriminative in low SNR conditions. We show that selectively utilizing features from a combination of transition and steady vowel zones further improves the performance of the score-level fusion based SI and SV systems under noisy conditions.