
Rongjie Huang
singing voice synthesis
speech-to-speech translation
multimodal
audio-visual learning
music information retrieval
voice conversion
snlp: generation
visual text-to-speech
diffusion transformer
generative spoken language model
text-guided generation
speech language model
text-to-song synthesis; contrastive pre-training; large language modeling.
automatic singing voice transcription
self-supervised learning
8
presentations
9
number of views
SHORT BIO
Rongjie Huang is with the College of Computer Science and Software at Zhejiang University. Research interest includes generative AI for speech/sing/audio and spoken language processing.
Presentations

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
Yu Zhang and 7 other authors

Robust Singing Voice Transcription Serves Synthesis
Ruiqi Li and 5 other authors

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Yongqi Wang and 5 other authors

Text-to-Song: Towards Controllable Music Generation Incorporating Vocal and Accompaniment
Zhiqing Hong and 7 other authors

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Yongqi Wang and 8 other authors

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
Huadai Liu and 7 other authors

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
Rongjie Huang

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis | VIDEO
Yu Zhang and 8 other authors