Help! Has anyone ever gotten AVSR (Audio-Visual Speech Recognition) to function with the PytorchAudio AVSR Examples?

Referencing this: https://github.com/pytorch/audio/tree/main/examples/avsr
Has anyone ever gotten AVSR to function with the PytorchAudio AVSR Examples?

I have been trying for a few days to get an Av_Hubert AVSR model functioning for inference like the ones shown here: https://facebookresearch.github.io/av_hubert/

for audio-visual speech recognition with a video and audio stream, however, the system provided on Pytorch does not appear to have any support for running inference on these models.

Any guidance to get an AVSR pre-trained model functioning would be great, as there does not appear to be a lot of support for such a system.