Following is the snippet of my code involving forward and backward pass.
final_loss = 0
final_entropy, final_policy_loss, final_value_loss = 0,0,0
indexes = self.starting_indexes(num_frames)
memory = memories[indexes]
accuracy = 0
for _ in range(self.args.recurrence):
obs = obss[indexes]
preprocessed_obs = self.obss_preprocessor(obs, device=self.device)
action_step = action_true[indexes]
mask_step = mask[indexes]
dist, value, memory = self.acmodel(preprocessed_obs, memory*mask_step)
entropy = dist.entropy().mean()
policy_loss = -dist.log_prob(action_step).mean()
loss = policy_loss - self.args.entropy_coef * entropy
action_pred = dist.probs.max(1, keepdim=True)[1]
accuracy += float((action_pred == action_step.unsqueeze(1)).sum())/len(flat_batch)
final_loss = final_loss + loss
final_entropy += entropy
final_policy_loss += policy_loss
indexes += 1
final_loss = final_loss/self.args.recurrence
if is_training:
self.optimizer.zero_grad()
final_loss.backward()
self.optimizer.step()
When I run it with values of recurrence > 1
, it shows me the following error.
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
I am not able to identify an inplace operation in here. Any help is appreciated. Thanks.