What are the effects of backproping on a truncated tagret/output?

Ge0rges · May 9, 2020, 4:12pm

In my training loop what would the affects of truncating the model output and target vectors as so:

action_output = model(inputs)
         
# The effects of the below two lines
action_output = action_output[:, tasks]   # Tasks is an array of 1*"target size"
action_target = action_target[:, tasks]

action_loss = criterion(action_output, action_target)
        
action_loss.backward()

What I believe it is doing is that it is truncating the computation graph and backproping only on weights that affect that output node.

ptrblck · May 10, 2020, 6:58am

Yes, that should be the case, which would create a zero gradient for the last linear layer:

model = models.resnet18()
data = torch.randn(2, 3, 224 , 224)
target = torch.randn(2, 1000)
criterion = nn.MSELoss()

output = model(data)
output = output[:, :100]
target = target[:, :100]
loss = criterion(output, target)
loss.backward()

print(model.fc.weight.grad[100:].abs().sum())
> tensor(0.)

Note that the preceding layers would have valid gradients, as they are not directly connected to the output.