This question has been confusing me for a long time. How does pytorch confirm whether each child node should increase or decrease, how does it determine the initial value of the current node, and how does it search for it using Backward
The backward pass will use the chain rule to compute the gradient of the loss w.r.t. all trainable parameters. This gradient will be used to update the parameters reducing the loss.
Trainable parameters are initialized randomly (often using specific random distributions, scaling etc.) and re updated afterwards during the training.
The computation graph is created in the forward pass via Autograd and the backward
pass uses it for its backpropagation.
Thank you very much for your guidance!