log_probs might be detached from the computation graph. Check, if it’s .grad_fn attribute is pointing to a valid function or None (detached). In the latter case, print the .grad_fn attribute of the intermediate tensors in your model’s forward method to check which operation detaches the tensor from the computation graph. Often this is done by e.g. rewrapping a tensor via x = torch.tensor(x), using another library such as numpy, explicitly calling tensor.detach() etc.