Implementing reinforce using gradient scaling

I am trying to learn pong by scaling the loss gradients with rewards but it is not learning anything.
I have not done discounting because I think in some problems this might not be correct for example when producing a word sequence.

def update_grad(grad):
        grad = torch.mul(grad, rewards_tensor)
        return grad 

Here is my current implementation:

What is the correct way to do this? I am learning pytorch and I am very new to RL.

Thanks a lot, :slightly_smiling_face: