Speech-to-Text (STT) - Hidden layer representations of pretrained model

Hi, I need to use some pre-trained speech to text model like DeepSpeech and I want to access the hidden layer representations so preferably the model should be subclass of nn.Module. I tried to use Deep Speech but they have all the instruction and use case examples using CLI. I simply want to load model using python script and be able to access the hidden layers as I do inference.
Transformers from Huggingface provides good interface for my problem but they only have models based on transformer architecture. I need some RNN or LSTM based model. Can someone share example code that allows me to load some pretrained model and do inference? (the model should subclass nn.Module).