Hi. I’m trying to implement DQN with prioritized experience replay. (This paper: https://arxiv.org/pdf/1511.05952.pdf). I need to multiply the gradients of the parameters by importance sampling weights before I update the neural network parameters. For my loss function I’ve been using huber loss.
Here is a snippet of my code:
self.optimizer.zero_grad()
loss.backward()
for param, weight in zip(self.qnet.parameters(), sampling_weights_batch):
param.grad.data *= weight
Instead of redefining the smooth_l1_loss, I go through each parameter gradient and multiply it by the corresponding sampling weight which I stored in my replay memory. Is there a faster way to do this? Thanks!
You will have to use ._grad in order to overwrite the gradient.
But you should definitely prefer to change the loss computation (it would be much simpler and cleaner). The smooth_l1_loss is immediate to rewrite by hand, and you just need a step to multiply with your weights before summing the batch dimension. Something like this:
I was wondering if your Prioritized Experience Replay code was available anywhere? Think it would make a really cool addition if it was possible to add it to the PyTorch tutorial for Q-learning?
Hi Ajay. Unfortunately its not working, and the prioritized code is work related which I myself didn’t implement - but my reference was jaromiru’s code (worth checking out if you want to add it to the pytorch tutorial). If I do get something on my own I’ll let you know, or we can code something together!
Also, I have my own dqn code which I put on github: https://github.com/8Gitbrix/DQN . I tested its convergence to a local optimum on the game freeway, but I haven’t ran any rull tests on games since I only have a cpu on my macbook . If anyone could test it and push the results that would be great.