Implementing reinforce using gradient scaling

norm · November 2, 2017, 2:35pm

I am trying to learn pong by scaling the loss gradients with rewards but it is not learning anything.
I have not done discounting because I think in some problems this might not be correct for example when producing a word sequence.

def update_grad(grad):
        grad = torch.mul(grad, rewards_tensor)
        return grad

Here is my current implementation:

gist.github.com

https://gist.github.com/pj-parag/fce77fe977263bb9360d257c332fe44b

gradient_scaling.py

import numpy as np
import cPickle as pickle
import gym
import torch
import torch.nn as nn
from torch import optim
from torch.autograd import Variable
from torch import optim
import torch.nn.functional as F

This file has been truncated. show original

What is the correct way to do this? I am learning pytorch and I am very new to RL.

Thanks a lot,