Hi Adam, I came back to this this week, and I managed to get the basic SBN working. I took the basic mnist example https://github.com/pytorch/examples/blob/master/mnist/main.py and wrote my own net class.

I’m now trying to implement the MuProp algorithm (https://arxiv.org/abs/1511.05176), to derive a control variate for the REINFORCE signal - something to reduce the variance, also see http://dustintran.com/blog/muprop-unbiased-backpropagation-for-stochastic-neural-networks

Basically, you subtract from the ‘reward’ something (a control variate) that correlates with it (but does not depend on the sample generating the reward), but to keep the gradient estimate unbiased you add the mean that this term has contributed. The MuProp algorithm uses a control variate based on a Taylor expansion around what it calls the mean field network, but is really the deterministic ‘standard’ neural net (neurons take values of sigmoids without sampling).

Below is a single sample SBN that works with the MNIST example. If you change the line "z1.reinforce(loss_repeated-CV_repeated) " to just have loss_repeated as the reward, you have standard REINFORCE for an SBN.

I am OK with implementing the Taylor expansion, but I’m wondering how I would add back the mean to make the gradient unbiased. Should I use a hook to the parameter gradients to add something to the gradient before the update step? At the moment I call backward() on the deterministic network’s loss inside the forward pass to add this as a hook - can I do this?

Cheers,

George

PS. I’m aware this is pretty specific and may not be that easy to follow!

class SBNBase(nn.Module):

def **init**(self):

super(SBNBase, self).**init**()

```
self.w1 = Parameter(torch.Tensor(28 * 28, 200)) #seems to be more flexibility in using parameters than Linear layers
self.wlast = Parameter(torch.Tensor(200, 10))
def expected_loss(self, target, forward_result):
(a1, mu1, z1), (a2, logprobs_out) = forward_result
return F.nll_loss(logprobs_out, target)
def expected_loss(self, target, forward_result):
(a1, mu1, z1), (a2, logprobs_out) = forward_result
return F.nll_loss(logprobs_out, target)
def forward(self, x, target):
x = x.view(-1, 28*28)
a1 = x.mm(self.w1)
mu1 = F.sigmoid(a1)
z1 = torch.bernoulli(mu1) # first hidden layer samples
alast = z1.mm(self.wlast)
logprobs_out = F.log_softmax(alast)
expected_loss = self.expected_loss(target, ((a1, mu1, z1), (alast, logprobs_out)))
'''MuProp Taylor expansion, deterministic forward prop'''
deta1 = x.mm(self.w1)
detmu1 = F.sigmoid(deta1)
detalast = detmu1.mm(self.wlast)
detlogprobs_out = F.log_softmax(detalast)
detexpected_loss = self.expected_loss(target, ((a1, mu1, z1), (detalast, detlogprobs_out)))
detexpected_loss.backward() # can I do this in forward???
control_var = detexpected_loss.data + torch.sum(detmu1.grad.data*(z1.data-detmu1.data))
loss_repeated = expected_loss.data.repeat(z1.size())
CV_repeated = control_var.repeat(z1.size())
z1.reinforce(loss_repeated-CV_repeated) # REINFORCE stochastic layer, control variate included
#print(self.w1.grad.size())
cvmu = self.w1.grad.data # Here is where I am confused! Is this gradient from the deterministic network?
h1 = self.w1.register_hook(lambda grad: grad+cvmu)
return ((alast, logprobs_out)), expected_loss
```