For torchvision there are tutorials on how the pretrained models are created so that the results can be replicated here.
I was wondering if the same thing exists for torchaudio. Specifically looking to train wav2vec2 from scratch.