How to differentiate weighted parameter updates?

Suppose we have a weighted loss (theta are the networks parameters):

o = model(data; theta)
L = wi*F.cross_entropy(o, target, reduction=“none”),

where the wi are the result of another network (model_w). Assume we’re using SGD to update the parameters in model so that:
theta’(wi) = theta-lr x wi x grad(F.cross_entropy(data, target, reduction=“none”))

If the weight network has a loss that’s of the form:

L_w = loss(data,target,model(data; theta’(wi)))

is there an easy way to backprop for the weight parameters? Or do I need to implement the backpropagation myself?


The issue is that you want to backprop through the optmizer step?
Well the issue is that the optimizers in pytorch are not differentiable :confused: So I would recommend using a library like higher that is built do differentiate through gradient updates!

Something like that, yes :slight_smile: I see, I’ll take a look at higher, thanks for the recommendation!