I know there are topics with very similar title, but my question is about a special case.
I’m implementing an RL algorithm called soft actor-critic. There’s a step in the algorithm in which I need to update the weights of a network in order to minimize the loss computed from the output of a subsequent network.
Setting requires_grad=False for the subsequent network worked. But I wonder whether this is truly saving computational effort (the time difference is small so I can’t tell). In other words, I think PyTorch’s autograd engine still needs to compute the gradients for the subsequent network IN ORDER TO compute the gradients for the previous network.
Is this understanding correct?