Hello, as some of you may know, Whisper is a Seq2Seq model, whereas Wav2Vec2ForC2C is a C2C model. The main difference here is that Whisper needs decoder_input_ids, whereas Wav2Vec2 does not.
My question is, can I just use Whispers Encoder model and attach a C2C head over the top of that to leverage both Whispers generality and C2C to get the output I desire? Or would this clearly not work?
Any help would be appreciated, but a simple yes or no will suffice as well. Thank you in advance!