Question is after code block. My module has two parts, partA and partB which are triggered conditionally in the forward function:
class Network(nn.Module):
def __init__(self):
super(Network, self).__init__()
self.partA = nn.Sequential(
convolutions
)
self.partB = nn.Sequential(
convolutions
nn.Flatten()
nn.Linear()
)
def forward(input_tensor):
if conditionB:
return self.partB(input_tensor)
elif conditionA: # for clarity. could just use 'else' here.
return self.partB(self.partA(input_tensor))
## initialize and train
model = Network().to(device)
opt = optim.SGD(model.parameters(), lr=0.0001, momentum=0.9)
for epoch in range(num_epochs):
for i, data in enumerate(trainloader):
model.zero_grad()
outputA = model(input_tensor) # condition A
err = criterion(labels, outputA)
err.backwards()
opt.step()
model.zero_grad()
outputB = model(input_tensor) # condition B
err = criterion(labels, outputB)
err.backwards()
opt.step()
During training, when condition B is triggered, there will be gradients through partB of the network but not partA. However, will the momentum term in SGD cause an update in partA when opt.step() is called, even though it was not involved in the computational graph during the condition B step? If so, how should I avoid this? Give partA and partB separate optimizers?