Hello People,
I am trying to train an off-line reinforcement learning algorithm. The algorithm that I am using follows this paper:
In this algorithm, 4 different neural networks are trained simultaneously, one of them being a Q-function that is approximated by a deep neural network. This is the architecture of my DQN:
class Critic(nn.Module):
def init(self, state_dim, action_dim):
super(Critic, self).init()
self.l1 = nn.Linear(state_dim + action_dim, 400)
self.l2 = nn.Linear(400, 300)
self.l3 = nn.Linear(300, 1)
self.l4 = nn.Linear(state_dim + action_dim, 400)
self.l5 = nn.Linear(400, 300)
self.l6 = nn.Linear(300, 1)
def forward(self, state, action):
q1 = F.relu(self.l1(torch.cat([state, action], 1)))
q1 = F.relu(self.l2(q1))
q1 = self.l3(q1)
return q1
I use adam optimizer to optimize the model parameters:
self.critic_optimizer = torch.optim.Adam(self.critic.parameters(), lr=1e-3)
I compute the loss by:
current_Q1, current_Q2 = self.critic(state, action)
critic_loss = F.mse_loss(current_Q1, target_Q)
And I do one step of optimization by:
self.critic_optimizer.zero_grad()
critic_loss.backward()
self.critic_optimizer.step()
But the critic loss is exploding:
Thanks for your help and support in advance!