top of page

ExpressiveSinger: Multilingual and Multi-Style Score-based Singing Voice Synthesis with Expressive Performance Control

This is my internship project at NVIDIA Research from May 2022 to Feb 2023.

[Paper]    [Demo page]

​

This paper takes score, lyrics, style label, and singer information as input and generates expressive and realistic singing. It involves a cascade of diffusion models. The pipeline involves (1)  performance control models, including timing, F0 curves, and loudness curves; (2) an acoustic model that generates the mel-spectrograms conditioning on performance control signals; (3) a DiffWave vocoder to generate the waveform from mel-spectrograms and F0 curves. The following figure shows a high-level architecture.

Screen Shot 2023-11-06 at 09.02.34.png

Input:       1. Score; 2. Lyrics; 3. Style; 4. Singer Info

Output:    Expressive and realistic singing

Generated Example
A Happy Birthday Song in Chinese sung by different singers/styles

This song is not in the training data and is generated from the score from scratch:

​BTW, many singers in this demo have never sung Chinese in the training data.

​

Multilingual & Stylistic Demo

Generated:                                    Ground-Truth:

Generated:                                    Ground-Truth:

Generated:                                    Ground-Truth:

 

Some Opera Singing Generated:

​

00:00 / 01:02
00:00 / 00:15
00:00 / 00:08
00:00 / 00:08
00:00 / 00:15
00:00 / 00:08
00:00 / 00:08
00:00 / 00:16
00:00 / 00:21
00:00 / 00:19

@ Shuqi Dai 2024  | All Rights Reserved

bottom of page