Forward and backward about pytorch

ptrblck · July 12, 2019, 10:55am

I’m not sure, why the shapes differ, but apparently the wrong gradients are stored.
Here is a small dummy example using vgg16:

grads = []
def save_grad(grad):
    grads.append(grad)

# Create model
model = models.vgg16()
model.eval()

# First approach
x = torch.randn(1, 3, 224, 224)
output = model.features(x)
output.register_hook(lambda x: save_grad(x))
output = model.avgpool(output)
output = output.view(output.size(0), -1)
output = model.classifier(output)
output.mean().backward()

# Reset
model.zero_grad()

# Second approach
output = x.clone()
for name, module in model.features._modules.items():
    output = module(output)
    if '30' in name:
        output.register_hook(lambda x: save_grad(x))

output = model.avgpool(output)
output = output.view(output.size(0), -1)
output = model.classifier(output)
output.mean().backward()

# Compare gradients
print((grads[0] == grads[1]).all())
> tensor(1, dtype=torch.uint8)

I tried to stick to your approach and as you can see, both methods yield the same gradients.

I guess data parallel isn’t being used, since you are not calling the model directly, but each submodule.
This might be related to this issue.