Differentiating with respect to input: tensor does not require grad and does not have a grad_fn, yet requires_grad is set to True

Check if out_lsm.requires_grad == True. If not, maybe there is some operation in self.evaluate that is non differentiable that is causing requires_grad to be False

You can refer to this FAQ to know more about None gradients Why are my tensor's gradients unexpectedly None or not None?