Sigmoid Belief Networks

Hi, I was wondering how I should go about implementing a SBN in torch? (references 14 and 7 in, for example)
You guys put in something for stochastic nodes, with the RL example here,
But the neural network itself isn’t actually stochastic - just that the output is sampled. I get that this generates the next state (input to the network) and so in a way it’s like a stochastic neural network - but each stochastic node needs a reward right? Would I just use the same ‘reward’ for each stochastic neuron?


So I adapted the file to learn XOR, with the output sampled and using that to get a ‘reward’ (-loss). This is as opposed to using negative log likelihood as the loss.
A full SBN would also sample the hidden layer (which have neurons as Bernoulli r.v., parameterised by sigmoid(affine of prev layer) ).

Is there an efficient way to implement this as the full stochastic neuron? Would I call a simple one layer nn recursively, since Torch is dynamic?

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.autograd as autograd
from torch.autograd import Variable


class Policy(nn.Module):
    def __init__(self):
        super(Policy, self).__init__()
        self.affine1 = nn.Linear(2, 10)
        self.affine2 = nn.Linear(10, 2)
        self.affine = nn.Linear(2, 2)
        self.saved_outputs = []
        self.rewards = []
    def forward(self, x):
        x = F.sigmoid(self.affine1(x))
        action_scores = self.affine2(x)
        return F.softmax(action_scores)

model = Policy()
optimizer = optim.Adam(model.parameters(), lr=1e-2)

x_input = np.array([[1.0,1.0],[1.0,0.0],[0.0,1.0],[0.0,0.0]])
target = np.array([[0.0], [1.0],[1.0],[0.0]])

for i_episode in range(400):
    for t in range(20):
        ind = np.random.randint(4)
        xin = x_input[ind]
        tar = target[ind]

    x_input_Tensor = torch.from_numpy(xin).float().unsqueeze(0)
    probs = model(Variable(x_input_Tensor)) # prob of y
    output = probs.multinomial() # sampled from softmax
    print(xin, tar,
    model.saved_outputs.append(output) # action is a torch.LongTensor, 0s and 1s

    reward = 1.0*( == tar)
    saved_outputs = model.saved_outputs
    rewards = []
    for r in model.rewards[::-1]:
        R = r
        rewards.insert(0, R)
    rewards = torch.Tensor(rewards)
    rewards = (rewards - rewards.mean()) / rewards.std()
    for output, r in zip(model.saved_outputs, rewards):

autograd.backward(model.saved_outputs, [None for _ in model.saved_outputs])
del model.rewards[:]
del model.saved_outputs[:]

Yeah, you could just generate the probabilities of firing with a Linear followed by a Sigmoid, and then call .bernoulli() on the output. That will sample the activations. Then, when you’ll want to backpropagate the errors, you’ll need to provide a reward for every sampled value. Once you do this, you can just call .backward() on the output, and that should do it. You can reuse a single layer multiple times in a single forward pass, it’s perfectly valid in PyTorch.

I think you might have a small bug in your XOR example - you only call the optimizer once after all these iterations. Not sure if that’s what you wanted.

Hi Adam, I came back to this this week, and I managed to get the basic SBN working. I took the basic mnist example and wrote my own net class.

I’m now trying to implement the MuProp algorithm (, to derive a control variate for the REINFORCE signal - something to reduce the variance, also see

Basically, you subtract from the ‘reward’ something (a control variate) that correlates with it (but does not depend on the sample generating the reward), but to keep the gradient estimate unbiased you add the mean that this term has contributed. The MuProp algorithm uses a control variate based on a Taylor expansion around what it calls the mean field network, but is really the deterministic ‘standard’ neural net (neurons take values of sigmoids without sampling).

Below is a single sample SBN that works with the MNIST example. If you change the line "z1.reinforce(loss_repeated-CV_repeated) " to just have loss_repeated as the reward, you have standard REINFORCE for an SBN.

I am OK with implementing the Taylor expansion, but I’m wondering how I would add back the mean to make the gradient unbiased. Should I use a hook to the parameter gradients to add something to the gradient before the update step? At the moment I call backward() on the deterministic network’s loss inside the forward pass to add this as a hook - can I do this?

PS. I’m aware this is pretty specific and may not be that easy to follow!

class SBNBase(nn.Module):
def init(self):
super(SBNBase, self).init()

    self.w1 = Parameter(torch.Tensor(28 * 28, 200)) #seems to be more flexibility in using parameters than Linear layers
    self.wlast = Parameter(torch.Tensor(200, 10))

def expected_loss(self, target, forward_result):
    (a1, mu1, z1), (a2, logprobs_out) = forward_result
    return F.nll_loss(logprobs_out, target)

def expected_loss(self, target, forward_result):
    (a1, mu1, z1), (a2, logprobs_out) = forward_result
    return F.nll_loss(logprobs_out, target)

def forward(self, x, target):
    x = x.view(-1, 28*28)

    a1 =
    mu1 = F.sigmoid(a1)

    z1 = torch.bernoulli(mu1) # first hidden layer samples

    alast =
    logprobs_out = F.log_softmax(alast)

    expected_loss =  self.expected_loss(target, ((a1, mu1, z1), (alast, logprobs_out)))
    '''MuProp Taylor expansion, deterministic forward prop'''
    deta1 =
    detmu1 = F.sigmoid(deta1)
    detalast =
    detlogprobs_out = F.log_softmax(detalast)
    detexpected_loss =  self.expected_loss(target, ((a1, mu1, z1), (detalast, detlogprobs_out)))

    detexpected_loss.backward() # can I do this in forward???
    control_var = + torch.sum(*(
    loss_repeated =
    CV_repeated = control_var.repeat(z1.size())
    z1.reinforce(loss_repeated-CV_repeated)     # REINFORCE stochastic layer, control variate included
    cvmu = # Here is where I am confused! Is this gradient from the deterministic network?
    h1 = self.w1.register_hook(lambda grad: grad+cvmu)
    return ((alast, logprobs_out)), expected_loss
1 Like