I have a sequence of operation performed on x to yield y_. A, B, C are learnable parameters of my system.
y_ = A(B(C(x)))
Is it possible to define a loss such as:
L = (y - y_) + sum(dy/dC)
And how to do so? Currently, the backward call requires a scalar value. Also, how to do such that I don’t overwrite the actual update gradients dL/dweights while computing dy/dC?
@albanD any ideas?