Tensor has None grade despite being leaf node and requiring grad

bongbang · November 18, 2019, 4:22am

I’m trying to visualize an already trained CNN model according to these instructions, which involve finding an input image that maximizes a feature map of interest. Unfortunately, the input variable is stuck with a None grad after back-propagation, so learning is impossible. How can I get the gradients to be computed?

device = torch.device('cpu')

class M7(nn.Module):
    # Zhang. M7-1
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 64, 3, padding=1)
        self.conv2 = nn.Conv2d(64, 128, 3, padding=1)
        self.conv3 = nn.Conv2d(128, 256, 3, padding=1)
        self.conv4 = nn.Conv2d(256, 512, 3, padding=1)
        self.conv5 = nn.Conv2d(512, 512, 3, padding=1)
        self.fc1 = nn.Linear(512 * 4 * 4, 1024) # 4 = 64/(2^4)
        self.fc2 = nn.Linear(1024, 200)

    def forward(self, X):
        X = F.max_pool2d(F.relu(self.conv1(X)), (2, 2), (2, 2))
        X = F.max_pool2d(F.relu(self.conv2(X)), (2, 2), (2, 2))
        X = F.max_pool2d(F.relu(self.conv3(X)), (2, 2), (2, 2))
        X = F.relu(self.conv4(X))
        X = F.max_pool2d(F.relu(self.conv5(X)), (2, 2), (2, 2))
        
        X = F.dropout(torch.flatten(X, start_dim=1))
        X = F.dropout(F.relu(self.fc1(X)))
        X = self.fc2(X)
        return F.log_softmax(X, dim=1) # log-probabilities

m7 = M7()

from torchvision import transforms
data_dir = '../input/casia-hwdb11-200-most-common/train/'
mean_image = 255 - np.load(os.path.join(data_dir, 'mean_image.npy'))
transform = transforms.Compose([
    transforms.Lambda(cv2.bitwise_not),
    transforms.Lambda(lambda img: img - mean_image),
    transforms.Lambda(torch.from_numpy),
    transforms.Lambda(lambda x: x.float()),
    transforms.Lambda(lambda x: x[None][None])
 ])

class SavedFeatures():
    def __init__(self, module):
        self.hook = module.register_forward_hook(self.hook_fn)
    def hook_fn(self, module, input, output):
        self.features = torch.tensor(output, requires_grad=True, device=device)
    def close(self):
        self.hook.remove()

img = np.uint8(np.random.uniform(40 - 15, 40 + 15, (64, 64)))
img = transform(img)
img.requires_grad_()
m7.to(device).eval()
activations = SavedFeatures(list(m7.children())[1])
_ = m7(img)
loss = -activations.features[0, 0].mean()
loss.backward()

print('requires grad:', img.requires_grad) 
print('is leaf:', img.is_leaf) 
print()
print(loss)
print(img.grad)

Here’s the output, which always includes a warning for copying construct a tensor. I haven’t been able to identify what triggers this warning, but it may be related to my grad problem.

requires grad: True
is leaf: True

tensor(14.2372, grad_fn=<NegBackward>)
None

/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:16: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  app.launch_new_instance()

ptrblck · November 18, 2019, 5:23am

By rewrapping your intermediate activation in a new tensor, you are detaching it from the computation graph:

self.features = torch.tensor(output, requires_grad=True, device=device)

You could assign self.features directly to output, which will give you a valid gradient in your input tensors.

bongbang · November 18, 2019, 6:00am

Thank you. Your solution works. But how come creating a new tensor seems to work in the original code?

class SaveFeatures():
    def __init__(self, module):
        self.hook = module.register_forward_hook(self.hook_fn)
    def hook_fn(self, module, input, output):
        self.features = torch.tensor(output,requires_grad=True).cuda()
    def close(self):
        self.hook.remove()

ptrblck · November 18, 2019, 6:03am

I’m not familiar with the code and just a bit with FastAI.
Are you getting valid gradients for the input image using this code snippet?

bongbang · November 18, 2019, 6:07am

No, but the code apparently worked for the author.

ptrblck · November 18, 2019, 6:19am

I’m not sure, as it shouldn’t have worked before (creating a new tensor has always detached the output from the computation graph), and there is also an open issue, which describes the same error.

I’m really not sure, if the code was working at some point or if the author used another code base to write the blog post.
In the comments one user seems to have forked his code (link), which does not contain the recreation of the tensor.

bongbang · November 18, 2019, 8:01pm

Thank you. Now I’ve learned something about how to use Github as well.