Training Loss is growing

Hello People,

I am trying to train an off-line reinforcement learning algorithm. The algorithm that I am using follows this paper:

In this algorithm, 4 different neural networks are trained simultaneously, one of them being a Q-function that is approximated by a deep neural network. This is the architecture of my DQN:

class Critic(nn.Module):
def init(self, state_dim, action_dim):
super(Critic, self).init()
self.l1 = nn.Linear(state_dim + action_dim, 400)
self.l2 = nn.Linear(400, 300)
self.l3 = nn.Linear(300, 1)

    self.l4 = nn.Linear(state_dim + action_dim, 400)
    self.l5 = nn.Linear(400, 300)
    self.l6 = nn.Linear(300, 1)

def forward(self, state, action):
    q1 = F.relu(self.l1([state, action], 1)))
    q1 = F.relu(self.l2(q1))
    q1 = self.l3(q1)

    return q1

I use adam optimizer to optimize the model parameters:

self.critic_optimizer = torch.optim.Adam(self.critic.parameters(), lr=1e-3)

I compute the loss by:

current_Q1, current_Q2 = self.critic(state, action)
critic_loss = F.mse_loss(current_Q1, target_Q)

And I do one step of optimization by:


But the critic loss is exploding:

Thanks for your help and support in advance!