It’s mentioned here torch.nn.init — PyTorch 2.1 documentation that
functions used to initialize neural network parameters run in torch.no_grad()
mode so that they will not be taken into account by autograd.
I am not able to get a feel of this statement, it seems to be correct that, “why do even need to consider the initialization function for the backward function.” but what will be the effect if we consider it?