I know the basics of PyTorch and I understand neural nets.
I have a set of observations and they go through a NN and result in a single scalar. I want to maximise that scalar (i.e. perform gradient ascent so that the expectation is maximised). Is there a torch.nn.*Loss function for this? I can’t see it. One hack would be to define a number that is bigger that I can ever see in the output (yes, it’s bounded) and use torch.nn.L1Loss() but there should be a cleaner way to do this. Maybe I shouldn’t be using a torch.nn.*Loss() function at all?
Just use pytorch to minimize the negative of your scalar.
my_single_scalar = model (input)
my_negative_scalar = -my_single_scalar
(Pytorch optimizers minimize their objective functions. To maximize,
you just flip the sign.)
Yes, no need to use a
torch.nn.ImAtALoss() function. There is
nothing special about them. They are just (autograd-supporting)
implementations of loss functions commonly used for training.
As long as you use pytorch tensor operations that support autograd,
you can use your own computation for the loss, (including something
as simple as
Cool! Thank you Sir! I’ve written some code and now I can call my_negative_scalar.backward() and print(x, model.state_dict()) before and after optimizer.step() and see my model change. I really do appreciate a five line expanation, so much of PyTorch gets wrapped in vast libraries, it’s a real help to see it simply laid out.
hi @KFrank, @tonyr , In case of multi-objective cost functions, where I am trying to minimize one loss but maximize the other, is calculating the total loss by adding them up and do a total_loss.backward() the correct way to do this?
My cost function has a reconstruction loss and a KL divergence loss b/w latent and prior data. I want to minimize the first one and maximize the later. So do I need to do to backprop separately or adding them up is just fine? Thanks a lot!
Combining them is probably what you want to do. You can combine them however you like, just call .backward() on the result, for example (partA - scale * partB).backward(). The reason that I say it’s probably the right thing is that in ML we make the assumption that what we want to achieve will happen if we minimise some loss. There are some (fairly rare) cases where you want to make more than one call to .backward(), but it looks to me like you are implementing a VAE and this isn’t one of them.
Thank you so much for the explanation!!
hi @tonyr , Just a follow up question on this. How do we select the scale parameter for this, any suggestions. My latent dimension loss has entropy loss(maximizing it) & KL divergence loss (minimizing it). The scale of these two differs by a huge margin. Any suggestions on this? Thanks!
If you are combining two loss functions in this way you generally want them both to go down a bit (invert sign or reciprocate what you have if you want to increase). Also, generally, there is a value which you get when no information flows through the net, like at the start, and there is a trained value, at the end. So start with them both roughly equally weighted, i.e. use something like the ratio of the start values. Then log the training run and see how both components change, hopefully both go down a bit, but one might go up. Then tweak things to get the behaviour you want. Note that if you have lossA and lossB then (lossA * lossB).backward() and all sorts of other functions may be good for you.