How does autograd merges 'parallel paths'?

I’m trying to manually compute the gradients of the loss function at the top:

The loss is the combination of a L1 loss and L_{dssim} loss and both are function of my model’s prediction y(θ).

Therefore I have two ways of using the Chain Rule to calculate dL/dθ. As I have highlighted in blue. Which I imagine to be two parallel paths in the computational graph.

How do I combine them? – Or better, how does PyTorch do so I can replicate it?

The total gradient is the sum of the gradients from each of the paths

1 Like

Why sum and not product? The chain rule involves products of derivatives.

yes, connecting two paths sequentially would be product, per the chain rule, but connecting two paths in parallel would be sum

1 Like