Tensor has None grade despite being leaf node and requiring grad

I’m trying to visualize an already trained CNN model according to these instructions, which involve finding an input image that maximizes a feature map of interest. Unfortunately, the input variable is stuck with a None grad after back-propagation, so learning is impossible. How can I get the gradients to be computed?

device = torch.device('cpu')

class M7(nn.Module):
    # Zhang. M7-1
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 64, 3, padding=1)
        self.conv2 = nn.Conv2d(64, 128, 3, padding=1)
        self.conv3 = nn.Conv2d(128, 256, 3, padding=1)
        self.conv4 = nn.Conv2d(256, 512, 3, padding=1)
        self.conv5 = nn.Conv2d(512, 512, 3, padding=1)
        self.fc1 = nn.Linear(512 * 4 * 4, 1024) # 4 = 64/(2^4)
        self.fc2 = nn.Linear(1024, 200)
​
    def forward(self, X):
        X = F.max_pool2d(F.relu(self.conv1(X)), (2, 2), (2, 2))
        X = F.max_pool2d(F.relu(self.conv2(X)), (2, 2), (2, 2))
        X = F.max_pool2d(F.relu(self.conv3(X)), (2, 2), (2, 2))
        X = F.relu(self.conv4(X))
        X = F.max_pool2d(F.relu(self.conv5(X)), (2, 2), (2, 2))
        
        X = F.dropout(torch.flatten(X, start_dim=1))
        X = F.dropout(F.relu(self.fc1(X)))
        X = self.fc2(X)
        return F.log_softmax(X, dim=1) # log-probabilities
​
m7 = M7()

from torchvision import transforms
data_dir = '../input/casia-hwdb11-200-most-common/train/'
mean_image = 255 - np.load(os.path.join(data_dir, 'mean_image.npy'))
transform = transforms.Compose([
    transforms.Lambda(cv2.bitwise_not),
    transforms.Lambda(lambda img: img - mean_image),
    transforms.Lambda(torch.from_numpy),
    transforms.Lambda(lambda x: x.float()),
    transforms.Lambda(lambda x: x[None][None])
 ])
​
class SavedFeatures():
    def __init__(self, module):
        self.hook = module.register_forward_hook(self.hook_fn)
    def hook_fn(self, module, input, output):
        self.features = torch.tensor(output, requires_grad=True, device=device)
    def close(self):
        self.hook.remove()

img = np.uint8(np.random.uniform(40 - 15, 40 + 15, (64, 64)))
img = transform(img)
img.requires_grad_()
m7.to(device).eval()
activations = SavedFeatures(list(m7.children())[1])
_ = m7(img)
loss = -activations.features[0, 0].mean()
loss.backward()

print('requires grad:', img.requires_grad) 
print('is leaf:', img.is_leaf) 
print()
print(loss)
print(img.grad)

Here’s the output, which always includes a warning for copying construct a tensor. I haven’t been able to identify what triggers this warning, but it may be related to my grad problem.

requires grad: True
is leaf: True

tensor(14.2372, grad_fn=<NegBackward>)
None

/opt/conda/lib/python3.6/site-packages/ipykernel_launcher.py:16: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  app.launch_new_instance()

By rewrapping your intermediate activation in a new tensor, you are detaching it from the computation graph:

self.features = torch.tensor(output, requires_grad=True, device=device)

You could assign self.features directly to output, which will give you a valid gradient in your input tensors.

Thank you. Your solution works. But how come creating a new tensor seems to work in the original code?

class SaveFeatures():
    def __init__(self, module):
        self.hook = module.register_forward_hook(self.hook_fn)
    def hook_fn(self, module, input, output):
        self.features = torch.tensor(output,requires_grad=True).cuda()
    def close(self):
        self.hook.remove()

I’m not familiar with the code and just a bit with FastAI.
Are you getting valid gradients for the input image using this code snippet?

No, but the code apparently worked for the author.

I’m not sure, as it shouldn’t have worked before (creating a new tensor has always detached the output from the computation graph), and there is also an open issue, which describes the same error.

I’m really not sure, if the code was working at some point or if the author used another code base to write the blog post.
In the comments one user seems to have forked his code (link), which does not contain the recreation of the tensor.

Thank you. Now I’ve learned something about how to use Github as well.