I’m writing a baseline for my model based on REINFORCE. That means I expect it not to work very well, which it makes it difficult check whether my implementation is correct. I tried this simple script to check that I’ve understood how to do REINFORCE in Pytorch.
It trains an MLP to produce 4 simple curves (identity, square, cube and sin) on a 1D input. The output consists of 4 values (the means) and 4 variances, together making 4 1D Gaussians. I sample an output vector from this result, and apply REINFORCE to get a loss.
My question is simply, is this the standard way to apply reinforce for Normal distributions, and to distribute the loss over the batch? It seems to work for this simple example, but I need to make sure that I’m not crippling my baseline by misunderstanding something.
import torch
from torch import nn
from torch.autograd import Variable
import torch.nn.functional as F
batch = 64
iterations = 50000
# Two layer MLP, producing means and sigmas for the output
h = 128
model = nn.Sequential(
nn.Linear(1, h), nn.Sigmoid(),
nn.Linear(h, 8)
)
opt = torch.optim.Adam(model.parameters(), lr = 0.0005)
for i in range(iterations):
x = torch.randn(batch, 1)
y = torch.cat([x, x ** 2, x ** 3, torch.sin(x)], dim=1)
x, y = Variable(x), Variable(y)
res = model(x)
means, sigs = res[:, :4], torch.exp(res[:, 4:])
dists = torch.distributions.Normal(means, sigs)
samples = dists.sample()
# REINFORCE
mloss = F.mse_loss(samples, y, reduce=False)
loss = - dists.log_prob(samples) * - mloss
loss = loss.mean()
opt.zero_grad()
loss.backward()
opt.step()
if i % 1000 == 0:
print('{: 6} grad'.format(i), list(model.parameters())[0].grad.mean())
print(' ', 'loss', F.mse_loss(samples.data, y.data, reduce=False).mean(dim=0))
print(' ', 'sigs', sigs.mean(dim=0))