In no_grad() blocks but requires_grad is True

yaneura-no-gomi · February 15, 2020, 12:11pm

Hi, I’m new to Pytorch and I’m sorry I can’t explain it because English is not native.

I am doing GAN implementation now and am confused by weight initialization.
In the code I am referring to, initialize the weights as follows:

def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
        nn.init.constant_(m.bias.data, 0)
        print(nn.init.constant_(m.bias.data, 0).requires_grad)
        
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

G.apply(weights_init)
D.apply(weights_init)

results is here:

False
False
False
False
False
False
False
False
False
False

But according to this using “with no_grad()” is recommended.So ,

def weights_init(m):
    classname = m.__class__.__name__
    
    with torch.no_grad():
        if classname.find('Conv') != -1:
            nn.init.normal_(m.weight, 0.0, 0.02)
            nn.init.constant_(m.bias, 0)
            print(nn.init.constant_(m.bias, 0).requires_grad)
            
        elif classname.find('BatchNorm') != -1:
            nn.init.normal_(m.weight, 1.0, 0.02)
            nn.init.constant_(m.bias, 0)
        
G.apply(weights_init)
D.apply(weights_init)

but this results are:

True
True
True
True
True
True
True
True
True
True

I am confused
What are these differences? And how does this difference affect the calculation results?

Thanks.

ptrblck · February 16, 2020, 12:13am

Stick to the second approach, as the first one is deprecated and might yield unwanted side effects.
In a torch.no_grad() block all new operations won’t be tracked. The requires_grad attribute of your parameters will not be changed:

model = models.resnet50()

with torch.no_grad():
    print(model.fc.weight.requires_grad)    
    > True
    out = model(torch.randn(1, 3, 224, 224))
    print(out.requires_grad)
    > False

out = model(torch.randn(1, 3, 224, 224))
print(out.requires_grad)
> True