Wav2vec - <s></s> tokens

Brian_Hemmat · January 18, 2022, 7:56pm

The wav2vec2.0 base 960h model never seems to return a beginning of sentence or end of sentence token (or ’ or unknown so far). Is that expected? I can’t seem to find this discussed anywhere. Why are those tokens in the decoding dictionary? Why are those those options in the final emission matrix? Or am I just feeding in audio that is too difficult for the model to determine eos/bos? If so, can someone provide a counter-example?