DQN - exploding loss problem

I am trying to solve an image localization problem similar to the paper below. In short, I am trying to train an agent with dqn to control a bounding box to localize an object. I pretrained a resnet on image patches that contain and do not contain the object of interest. Then I add Linear layers to the resnet to form the DQN.
The DQN has several actions like translation and scaling.

The problem I am facing right now is an exploding loss problem. The loss keeps on increasing as I train it. With an Adam optimizer, I have tried learning rate ranging from 1e-3 to 1e-12 with batch size 50, 100 and 200. I also tried techniques like double dqn and prioritized experience replay. However, the exploding loss problem still cannot be alleviated. Therefore, I am writing to seek advice and suggestions on any possible reasons for this happening.


What are the activations on the outputs of your linear layers? Are you able to show the pytorch model code?