LSTM model performance depends on batch size even in eval mode()

A bit off-topic, but is

self.lstm.flatten_parameters()

really needed? And if so, shouldn’t it suffice to have it once at the end of _init_()?

You mentioned that you tried with batch_first=True and batch_first=False? Did you adjust shape often X accordingly? What’s the shape of X anyway? Do you perform any reshape() and/or view() operation on X?