The error is usually raised by disallowed inplace operations, which would manipulate tensors which are needed to compute the gradients in the backward pass.
Here is a small example:
# works
w = nn.Parameter(torch.randn(1, 1))
x = torch.randn(1, 1)
y = w * x
y.backward() # works
# fails
w = nn.Parameter(torch.randn(1, 1))
x = torch.randn(1, 1)
y = w * x
x += 1 # x is needed to compute the gradient wrt w in the multiplication, so you can't manipulate it
y.backward()
# > RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 1]] is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Check for these inplace manipulations and replace them with their out-of-place equivalents.