DQN - exploding loss problem

I am trying to solve an image localization problem similar to the paper below. In short, I am trying to train an agent with dqn to control a bounding box to localize an object. I pretrained a resnet on image patches that contain and do not contain the object of interest. Then I add Linear layers to the resnet to form the DQN.
The DQN has several actions like translation and scaling.

The problem I am facing right now is an exploding loss problem. The loss keeps on increasing as I train it. With an Adam optimizer, I have tried learning rate ranging from 1e-3 to 1e-12 with batch size 50, 100 and 200. I also tried techniques like double dqn and prioritized experience replay. However, the exploding loss problem still cannot be alleviated. Therefore, I am writing to seek advice and suggestions on any possible reasons for this happening.

http://slazebni.cs.illinois.edu/publications/iccv15_active.pdf

Thank you very much

What are the activations on the outputs of your linear layers? Are you able to show the pytorch model code?