Help editing loss or parameter update in Prioritized Experience Replay

8Gitbrix · August 8, 2017, 8:08pm

Hi. I’m trying to implement DQN with prioritized experience replay. (This paper: https://arxiv.org/pdf/1511.05952.pdf). I need to multiply the gradients of the parameters by importance sampling weights before I update the neural network parameters. For my loss function I’ve been using huber loss.

Here is a snippet of my code:

    self.optimizer.zero_grad()
    loss.backward()
    for param, weight in zip(self.qnet.parameters(), sampling_weights_batch):
            param.grad.data *= weight

Instead of redefining the smooth_l1_loss, I go through each parameter gradient and multiply it by the corresponding sampling weight which I stored in my replay memory. Is there a faster way to do this? Thanks!

alexis-jacq · August 9, 2017, 9:32am

You will have to use ._grad in order to overwrite the gradient.

But you should definitely prefer to change the loss computation (it would be much simpler and cleaner). The smooth_l1_loss is immediate to rewrite by hand, and you just need a step to multiply with your weights before summing the batch dimension. Something like this:

class WeightedLoss(nn.Module):
    def __init__(self):
        [....]

    def forward(self, input, target, weights):
        batch_loss = (torch.abs(input - target)<1).float()*(input - target)**2 +\
            (torch.abs(input - target)>=1).float()*(torch.abs(input - target) - 0.5)
        weighted_batch_loss = weight * loss 
        weighted_loss = weighted_batch_loss.sum()
        return weighted_loss

8Gitbrix · August 10, 2017, 3:53pm

It worked. Thank you Alexis!!!

AjayTalati · September 23, 2017, 4:47pm

Hi @8Gitbrix,

I was wondering if your Prioritized Experience Replay code was available anywhere? Think it would make a really cool addition if it was possible to add it to the PyTorch tutorial for Q-learning?

http://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

It’s very interesting

Best,

Ajay

8Gitbrix · September 23, 2017, 6:27pm

Hi Ajay. Unfortunately its not working, and the prioritized code is work related which I myself didn’t implement - but my reference was jaromiru’s code (worth checking out if you want to add it to the pytorch tutorial). If I do get something on my own I’ll let you know, or we can code something together!

Ashwin

8Gitbrix · September 23, 2017, 6:29pm

Also, I have my own dqn code which I put on github: https://github.com/8Gitbrix/DQN . I tested its convergence to a local optimum on the game freeway, but I haven’t ran any rull tests on games since I only have a cpu on my macbook . If anyone could test it and push the results that would be great.

AjayTalati · September 24, 2017, 6:24pm

Hi Ashwin,

there’s a good implementation of prioritized experience replay, (in TensorFlow unfortunately), by OpenAI - https://github.com/openai/baselines/blob/master/baselines/deepq/replay_buffer.py

Would be fun to try to port it over to PyTorch Seems that PER buffer is language agnostic?

Best,

Ajay