I understand that in Pytorch, a node in the computation graph can communicate with its downstream and upstream neighbor nodes through passing gradients. However, I wonder if a node in the graph can directly communicate with a non-neighboring downstream node. For example, I would like all the nodes in the graph to communicate directly with the loss computation. (aka broadcast a variable from the loss node to the entire graph) E.g. Say based on the softmax computations of the loss, I’d like to change the gradient computations of all the convolution layers.

What is the least-effort approach to do this? Thank you!