profile picture

Rongjie Huang

singing voice synthesis

speech-to-speech translation

multimodal

audio-visual learning

music information retrieval

voice conversion

snlp: generation

visual text-to-speech

diffusion transformer

generative spoken language model

text-guided generation

speech language model

text-to-song synthesis; contrastive pre-training; large language modeling.

automatic singing voice transcription

self-supervised learning

8

presentations

9

number of views

SHORT BIO

Rongjie Huang is with the College of Computer Science and Software at Zhejiang University. Research interest includes generative AI for speech/sing/audio and spoken language processing.

Presentations

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

Yu Zhang and 7 other authors

Robust Singing Voice Transcription Serves Synthesis

Ruiqi Li and 5 other authors

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer

Yongqi Wang and 5 other authors

Text-to-Song: Towards Controllable Music Generation Incorporating Vocal and Accompaniment

Zhiqing Hong and 7 other authors

Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt

Yongqi Wang and 8 other authors

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer

Huadai Liu and 7 other authors

AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation

Rongjie Huang

StyleSinger: Style Transfer for Out-of-Domain Singing Voice Synthesis | VIDEO

Yu Zhang and 8 other authors

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved