I am new to PyTorch and am trying to build a neural network that has two sequential networks trained and evaluated in this particular sequence:
- Part 1: the ‘latter’ part of the network is trained in isolation.
- Part 2: the ‘former’ part of the network is trained with its output fed to the ‘latter’ part of the network (held constant), and the output of the ‘latter’ part is used for the loss.
- Part 3: data is evaluated with the ‘former’ part of the network only.
My question is: I’m assuming I should have the ‘latter’ gradients known in part 2, correct? I would like to have the functions of the ‘latter’ part of the network operate on the output of the ‘former’ part, but NOT update the ‘latter’ parts parameters in the process.
Based on the example below, will PyTorch “know” that ‘latter’ operated on the loss, and include it in the computation graph and pass its gradients backwards? Assume the latter is already trained in the example below.
Any insight is highly appreciated
inp_var, out_var = Variable(field), Variable(lens) # from torch.utils.data.DataLoader optimizer.zero_grad() # reset gradients to 0 for new batch output = former_model(inp_var) # forward pass of former model (to be trained) output = latter_net(output) # latter_net is pre-trained, output of former net fed to latter loss = criterion(output, inp_var) # loss function, comparing loss against the INPUT to the former loss.backward() # backward pass optimizer.step()