How to differentiate weighted parameter updates?

benf.stokes · May 30, 2020, 11:55pm

Suppose we have a weighted loss (theta are the networks parameters):

o = model(data; theta)
L = wi*F.cross_entropy(o, target, reduction=“none”),

where the wi are the result of another network (model_w). Assume we’re using SGD to update the parameters in model so that:
theta’(wi) = theta-lr x wi x grad(F.cross_entropy(data, target, reduction=“none”))

If the weight network has a loss that’s of the form:

L_w = loss(data,target,model(data; theta’(wi)))

is there an easy way to backprop for the weight parameters? Or do I need to implement the backpropagation myself?

albanD · May 31, 2020, 12:55am

Hi,

The issue is that you want to backprop through the optmizer step?
Well the issue is that the optimizers in pytorch are not differentiable So I would recommend using a library like higher that is built do differentiate through gradient updates!

benf.stokes · May 31, 2020, 10:48am

Something like that, yes I see, I’ll take a look at higher, thanks for the recommendation!