I‘m doing a seq2seq project, but when I run my project ,something like this ocurrs. I have read the similar topic,but still don’t know how to fix it. I’ve stuck here days, hope someone can help me.THANKS A LOT~~

The full error message is down below:

```
Traceback (most recent call last):
File "C:/Users/cqf/Desktop/试验/run.py", line 172, in <module>
run(get_options())
File "C:/Users/cqf/Desktop/试验/run.py", line 158, in run
train_epoch(
File "C:\Users\cqf\Desktop\试验\train.py", line 85, in train_epoch
train_batch(
File "C:\Users\cqf\Desktop\试验\train.py", line 154, in train_batch
loss.backward(retain_graph=True)
File "F:\Anaconda3\envs\Mypycharm\lib\site-packages\torch\_tensor.py", line 396, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "F:\Anaconda3\envs\Mypycharm\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [256, 10, 128]] is at version 47; expected version 46 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
```

As you can see, I have already opened the anomous mode,but kinda still confused.

My code is down below:

```
def train_batch(
model,
optimizer,
baseline,
epoch,
batch_id,
step,
batch,
tb_logger,
opts
):
x, bl_val = baseline.unwrap_batch(batch)
x = move_to(x, opts.device)
bl_val = move_to(bl_val, opts.device) if bl_val is not None else None
# Evaluate model, get costs and log probabilities
cost, log_likelihood = model(x)
# Evaluate baseline, get baseline loss if any (only for critic)
bl_val, bl_loss = baseline.eval(x, cost) if bl_val is None else (bl_val, 0)
# Calculate loss
reinforce_loss = ((cost - bl_val) * log_likelihood).mean()
loss = reinforce_loss + bl_loss
# Perform backward pass and optimization step
optimizer.zero_grad()
with torch.autograd.set_detect_anomaly(True):
loss.backward(retain_graph=True)
# Clip gradient norms and get (clipped) gradient norms for logging
grad_norms = clip_grad_norms(optimizer.param_groups, opts.max_grad_norm)
optimizer.step()
```

The forward function is down below:

```
def forward(self, input, return_pi=False):
"""
:param input: (batch_size, graph_size, node_dim) input node features or dictionary with multiple tensors
:param return_pi: whether to return the output sequences, this is optional as it is not compatible with
using DataParallel as the results may be of different lengths on different GPUs
:return:
"""
if self.checkpoint_encoder and self.training: # Only checkpoint if we need gradients
embeddings, _ = checkpoint(self.embedder, self._init_embed(input))
else:
embeddings, _ = self.embedder(self._init_embed(input))
_log_p, pi, cost = self._inner(input, embeddings)
init_lengths, mask = self.problem.get_costs(input, pi)
final_lengths = cost + init_lengths[:,None]
# Log likelyhood is calculated within the model since returning it per action does not work well with
# DataParallel since sequences can be of different lengths
ll = self._calc_log_likelihood(_log_p, pi, mask)
if return_pi:
return final_lengths.squeeze(), ll, pi
return final_lengths.squeeze(), ll
```

the _inner function which is the decode process is down below:

```
def _inner(self, input, embeddings):
outputs = []
sequences = []
state = self.problem.make_state(input)
# Compute keys, values for the glimpse and keys for the logits once as they can be reused in every step
fixed = self._precompute(embeddings)
batch_size = state.ids.size(0)
# Perform decoding steps
i = 0
while not (self.shrink_size is None and state.all_finished()):
if self.shrink_size is not None:
unfinished = torch.nonzero(state.get_finished() == 0)
if len(unfinished) == 0:
break
unfinished = unfinished[:, 0]
# Check if we can shrink by at least shrink_size and if this leaves at least 16
# (otherwise batch norm will not work well and it is inefficient anyway)
if 16 <= len(unfinished) <= state.ids.size(0) - self.shrink_size:
# Filter states
state = state[unfinished]
fixed = fixed[unfinished]
log_p, mask = self._get_log_p(fixed, state)
# Select the indices of the next nodes in the sequences, result (batch_size) long
selected = self._select_node(log_p.exp()[:, 0, :], mask[:, 0, :]) # Squeeze out steps dimension
state = state.update(selected, i)
# Now make log_p, selected desired output size by 'unshrinking'
if self.shrink_size is not None and state.ids.size(0) < batch_size:
log_p_, selected_ = log_p, selected
log_p = log_p_.new_zeros(batch_size, *log_p_.size()[1:])
selected = selected_.new_zeros(batch_size)
log_p[state.ids[:, 0]] = log_p_
selected[state.ids[:, 0]] = selected_
# Collect output of step
outputs.append(log_p[:, 0, :])
sequences.append(selected)
i = i+1
lengths = state.lengths + state.get_final_cost()
# Collected lists, return Tensor
return torch.stack(outputs, 1), torch.stack(sequences, 1),lengths
```

P.S. the project is a little bit large,so i don’t know how to simplify it. If you need the whole code pleaz contact me!

If you can’t see where the problem is ,maybe tell me how to find the missing variable is also helpful. And there is an another confusing. What is the version 47 and 46 in the error means?