Zero-shot Singing Synthesis

Zero-shot Singing Synthesis With Unseen Speech Target

This is my internship project at Adobe Research, starting in the summer of 2023. This project is still ongoing. Here are some preliminary results for informal sharing. Welcome to visit my poster @ ISMIR 2023 and have research discussions.

Input: 1. Score, lyrics (specify which language), style

2. 5-second speech audio of target (unseen target voice in training data)

Output: Singing in the target’s voice

Demo Example

Input Speech Target 1 (Female voice)
Output Singing
- An English Pop Song
- A Chinese Folk Song

Input Speech Target 2 (Female voice)
Output Singing
- An English Pop Song
- A Chinese Folk Song

Input Speech Target 3 (Male voice)
Output Singing
- A Chinese Folk Song
- An English Pop Song
- An Italian Opera Song

00:00 / 00:05

00:00 / 00:18

00:00 / 00:41

00:00 / 00:06

00:00 / 00:37

00:00 / 00:32

00:00 / 00:09

00:00 / 00:07

00:00 / 00:36

00:00 / 00:10