Computing gradient of a value w.r.t a vector 0.4.0 vs 1.7

@albanD, just final clarification, it will also be equivalent to write:

        grad_target = (output_cl * label)
        grad_target.backward(gradient=label * output_cl, retain_graph=True)

Because doing backward on the sum of output_cl * label with respect to itself as much as I understand is equivalent to doing backward on each multiplication element-wise, i.e minimizing each multiplication element-wise, which is minimal i.f.f the sum minimal, am I wrong?
Is it equivalent ? If it doesn’t, so what is the meaning of this derivation comparing to the first one ?
Appreciate your answering :slight_smile:
Thanks.