Hi I have the following network with the forward function shown here:

```
class RecurrentPNetwork(nn.Module):
''' Recurrent policy '''
def __init__(self, state_space,action_space,hidden_space=64):
super(RecurrentPNetwork,self).__init__()
self.hidden_space=hidden_space
self.fc1 = nn.Linear(state_space,hidden_space)
self.rnn = nn.LSTM(hidden_space,hidden_space)#,batch_first=True)
self.fc2 = nn.Linear(hidden_space,action_space)
self.hidden_memory = []
def forward(self,x):
x = F.relu(self.fc1(x))
if len(self.hidden_memory) == 0:
h_t = None
else:
h_t = self.hidden_memory[-1]
x, (new_h,new_c) = self.rnn(x,h_t)
new_h = new_h.detach().requires_grad_()
new_c = new_c.detach().requires_grad_()
self.hidden_memory.append((new_h,new_c))
out = F.softmax(self.fc2(new_h),dim=-1)
return out
```

Here I encode the action history in the hidden state which I use to estimate the probability for the new action hence I have to manually unroll the model. When I optimize the model I want to take the history of the hidden states into account. For that I introduce the hidden_memory list. So that I can store them.

First the h_0 and c_0 are initialized to 0 and the model continues for 1 episode. and it works. When the second episode begins I get this error: one of the variables needed for gradient computation has been modified by an inplace operation.

This happens at the lstm cell because now h_t is taken from the memory list. The way to solve that is to use detach() or detach().requires_grad_(), but then I miss out on these gradients and the model doesnt work. What should I do ?