Multiple output tutorial/examples

ducminhkhoi · May 16, 2017, 2:32am

I am doing with a network with multiple outputs. I have read this topic: How to do backward() for a net with multiple outputs?. But I still don’t understand how to do it. Do you guys have any clear tutorials/examples how to do it?

Thanks,

smth · May 16, 2017, 6:06am

This comment has an example: Multi Label Classification in pytorch

That thread has a lot of detail.

ducminhkhoi · May 16, 2017, 1:47pm

Thanks for your reply. But it is not exactly what I need. My network generates 3 outputs which have different loss functions (MSE, Cross Entropy,…). Or use the same loss function but completely different outputs (e.g. classification from list 1, list 2 and list 3)

Like this one (where Dense = Linear)

smth · May 19, 2017, 3:41pm

doing multiple output + losses should be straight-forward. In your forward function of the model, you have the variables corresponding to each output, and send them through three separate loss functions.

The outputs of the loss functions can be backpropagated all together using torch.autograd.backward

platero · May 26, 2017, 3:00am

Thanks for your hint. I also find this post. I try to compute a loss for each output and append them to a list as the following code,

loss_seq = []
        for o in output:
            cur_loss = criterion(o, target_var)
            loss_seq.append(cur_loss)

Then I print the losses, which seem quite correct.

Then I tried to do the backpropagation

 torch.autograd.backward(loss_seq)

However, an error occured:

 File "example/main.py", line 174, in train
    torch.autograd.backward(loss_seq)
TypeError: backward() takes at least 2 arguments (1 given)

Am I doing wrong? How can I do backpropagation with mutiple loss?

smth · May 26, 2017, 3:04am

you have to give: torch.autograd.backward(loss_seq, grad_seq) where grad_seq is the gradients for each of the losses.

If you want you can give gradients to be all ones:
grad_seq = [loss_seq[0].new(1).fill_(1) for _ in range(len(loss_seq))]

platero · May 27, 2017, 12:45pm

Thank you very much. But why can we use all ones as gradients here?

smth · May 28, 2017, 4:54am

if the loss is a scalar value, then all ones as gradients are what you will use if you call loss.backward().

cerisara · December 8, 2017, 1:46pm

Thanks: I understand the ones are the gradients dloss/dloss=1 that are put at the very end of the backpropagation graph.

But creating this grad_seq this way fails with
AttributeError: ‘Variable’ object has no attribute ‘new’

EDIT: OK, it looks like it automatically converts tensors to variables, so this seems to work:

grad_seq = [loss_seq[0].data.new(1).fill_(1) for _ in range(len(loss_seq))]

Timos_Korres · June 17, 2019, 10:46pm

Can a model output multiple outputs (as in the above experiments) and only have a single loss function?