I am just a little unclear on the specific details of how autograd works here - specifically, whether the gradients of model weights with respect to input/output Variables are saved within the model’s parameters or within the Variable objects themselves.
I am trying to train a model for use in a reinforcement learning task, and to do so I need to compute the forward pass many times for different inputs before I can get the reward values to perform the backward pass. I am wondering whether I can just do something like e.g.
Y = Variable(torch.zeros(N)) for i in range(N): Y[i] = model(X[i]) for i in range(N) optim.zero_grad() l = loss(Y[i], r) loss.backward() optim.step()
or whether I need to do something more sophisticated.