How to differentiate weighted parameter updates?

Suppose we have a weighted loss (theta are the networks parameters):

o = model(data; theta)
L = wi*F.cross_entropy(o, target, reduction=“none”),

where the wi are the result of another network (model_w). Assume we’re using SGD to update the parameters in model so that:
theta’(wi) = theta-lr x wi x grad(F.cross_entropy(data, target, reduction=“none”))

If the weight network has a loss that’s of the form:

L_w = loss(data,target,model(data; theta’(wi)))

is there an easy way to backprop for the weight parameters? Or do I need to implement the backpropagation myself?

Hi,

The issue is that you want to backprop through the optmizer step?
Well the issue is that the optimizers in pytorch are not differentiable :confused: So I would recommend using a library like higher that is built do differentiate through gradient updates!

Something like that, yes :slight_smile: I see, I’ll take a look at higher, thanks for the recommendation!