Publications

(2026). CtrlSpeech: Coarse-to-Fine Control for Expressive Speech Synthesis. Interspeech 2026.

Project

(2024). BAT: Learning to Reason about Spatial Sounds with Large Language Models. ICML 2024.

PDF Project

(2023). emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation.

PDF Cite Code Slides

(2023). Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition.

PDF Cite Slides

(2023). Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning. In ASRU 2023.

PDF Cite Code Slides

(2023). Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation. In INTERSPEECH 2023.

PDF Cite Slides

(2022). MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets. In INTERSPEECH 2023.

PDF Cite Code Dataset Slides

(2022). EXPLORING EFFECTIVE DISTILLATION OF SELF-SUPERVISED SPEECH MODELS FOR AUTOMATIC SPEECH RECOGNITION. ASRU 2023.

PDF Cite Slides