Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Speaker anonymization aims to modify the speech signal in order to protect the identity of a speaker while preserving the linguistic content. Despite the increasing use of children's voices in educational applications, such as oral reading fluency (ORF) assessment, there is little work on anonymization aspects. In this work, we investigate the effectiveness of available speaker anonymization methods drawing from traditional speech-production based approaches and a neural codec based method. We investigate the trade-off between privacy protection, measured as the degree of anonymity, and utility preservation, which in the current context of ORF assessment, includes the segmental and suprasegmental features of children’s read speech utterances. We report objective and subjective evaluations using two child-speaker datasets: MPS and SpeechOcean. Our objective evaluation results indicate that the speech-production based method of vocal tract length normalization coupled with pitch-transposition achieves the best balance between privacy and utility. Subjective listening results indicate that naturalness is achievable across methods while the neural method fails to preserve age characteristics, which are more easily controlled by the speech-production driven methods.
