RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [256, 10, 128]] is at version 47; expected version 46 instead

The full error message is down below:

File "F:\Anaconda3\envs\Mypycharm\lib\site-packages\torch\_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "F:\Anaconda3\envs\Mypycharm\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [256, 10, 128]] is at version 47; expected version 46 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

my model’s forward function is down below:
def forward(self, input, return_pi=False):

    if self.checkpoint_encoder and self.training:  # Only checkpoint if we need gradients
        embeddings, _ = checkpoint(self.embedder, self._init_embed(input))
    else:
        embeddings, _ = self.embedder(self._init_embed(input))

    _log_p, pi, cost = self._inner(input, embeddings)

    init_lengths, mask = self.problem.get_costs(input, pi)
    init_lengths = init_lengths[:,None]
    final_lengths = cost + init_lengths

    ll = self._calc_log_likelihood(_log_p, pi, mask)
    if return_pi:
        return final_lengths.squeeze(), ll, pi

    return final_lengths.squeeze(), ll