Whisper for C2C rather then Seq2Seq

P-Sood · October 29, 2023, 8:24pm

Hello, as some of you may know, Whisper is a Seq2Seq model, whereas Wav2Vec2ForC2C is a C2C model. The main difference here is that Whisper needs decoder_input_ids, whereas Wav2Vec2 does not.

My question is, can I just use Whispers Encoder model and attach a C2C head over the top of that to leverage both Whispers generality and C2C to get the output I desire? Or would this clearly not work?

Any help would be appreciated, but a simple yes or no will suffice as well. Thank you in advance!