How to transform classification into regression?

(Ajay Talati) #1

Well here’s a pretty simple problem, how do you go from a

i) classification problem with a single output from your model, and a

loss = nn.CrossEntropyLoss(output, labels)

ii) to a regression problem with a mu, and sigma2 (mean & variance) output from your model, which then goes through

y_pred = torch.normal( mu, sigma2.sqrt() )


loss = F.smooth_l1_loss(y_pred, labels)

Basically I want to change a MNIST classifier into regression exercise which outputs a Gaussian distribution. The bit that’s tripping me up is that the output y_pred is now is now stochastic, so I guess I need a .reinforce() on it, but I still don’t not get how to do this?

Here’s the relevant bit of my code,

    def forward(self, x):
        # Set initial states
        h0 = Variable(torch.zeros(self.num_layers*2, x.size(0), self.hidden_size)) # 2 for bidirection 
        c0 = Variable(torch.zeros(self.num_layers*2, x.size(0), self.hidden_size))
        # Forward propagate RNN
        out, _ = self.lstm(x, (h0, c0))
        # Decode hidden state of last time step
        mu = out[:, -1, :] )
        sigma2 = self.sigma2( out[:, -1, :] )
        return mu, sigma2

rnn = BiRNN(input_size, hidden_size, num_layers, num_classes)

# Loss and Optimizer
optimizer = torch.optim.Adam(rnn.parameters(), lr=learning_rate)
# Train the Model 
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = Variable(images.view(-1, sequence_length, input_size))
        labels = Variable( labels.float() )        
        # Forward + Backward + Optimize
        #outputs = rnn(images)
        mu, sigma2 = rnn(images)
        sigma2 = (1 + sigma2.exp()).log() # ensure positivity
        y_pred = torch.normal( mu, sigma2.sqrt() ) 
        y_pred = y_pred.float()
        #y_pred = Variable( torch.normal(mu, sigma2.sqrt()).data.float() )       
        loss = F.smooth_l1_loss( y_pred , labels )            

and the compile error,

  File "", line 90, in <module>
  File "/home/ajay/anaconda3/envs/pyphi/lib/python3.6/site-packages/torch/autograd/", line 158, in backward
    self._execution_engine.run_backward((self,), (gradient,), retain_variables)
  File "/home/ajay/anaconda3/envs/pyphi/lib/python3.6/site-packages/torch/autograd/", line 13, in _do_backward
    raise RuntimeError("differentiating stochastic functions requires "
RuntimeError: differentiating stochastic functions requires providing a reward

It’s modified from


OR, perhaps I’m making it more complicated than it needs to be with the Gaussian thing? Should I just stick an encoder on the output of the LSTM ???

Thanks a lot :slight_smile:

(Thomas V) #2

I’m not sure I understand exactly what you want to do, but would the same reparametrisation trick as in the VAE paper and implementations (e.g. pytorch/examples) work with the “usual” procedure?
You would convert standard normal randoms to a variable and then transform them with mu and sigma2. That way, the randoms are fixed w.r.t. the differentiation.

(Ajay Talati) #3

Hi @tom that’s what I think too!

I’ll give it a try :wink:

I’m fed up of all this .reinforce stuff !!!

Just to be a bit more clear, what I want to learn is a mapping from images to single real numbers y_pred, and those real number should be as close to the labels/class indices labels of the images as possible, as measured by loss = F.smooth_l1_loss(y_pred, labels)


(Adam Paszke) #4

Well the error should be quite self-explanatory, you haven’t provided the reward to the stochastic output. Cal .reinforce(reward) on y_pred, but before you cast it! Casts return a new Variable and it’s no longer a stochastic output!

(Ajay Talati) #5

Thanks @apaszke !!! That’s helpful.

My confusions, actually conceptual/the way I setup the problem - I haven’t figured out what the reward should be in this context!

It’s nothing to do with PyTorch, I just haven’t thought carefully enough about what I’m actually trying to do here - I was carrying over an idea from continuous action reinforcement learning, and it doesn’t seem to make sense in the context of regression?