.grad showing none. How to ensure flow of gradients?

class delft_block(nn.Module):

def init(self,input_dimensions,k):

super(delft_block,self).__init__()

#self.conv1 = nn.Conv2d(input_dim, 512, kernel_size=kernel_size, padding=padding, stride=stride)

#self.activate = nn.ReLU()

self.k = k

self.inp = input_dimensions

#self.conv1 = nn.Conv2d(input_dimensions,512,kernel_size=3,padding=1)

self.conv2 = nn.Conv2d(input_dimensions,1, kernel_size=1, padding=0, stride=1)

def forward(self,x): #x = b,512,8,8

b,c,h,w = x.size()

out = self.conv2(x)                                         #x = b,1,8,8

prob = nn.Softplus()(out)

att = F.softmax(prob.view(b,-1),dim=1)                      #b,8*8(pixel wise attention scores flattened)

val,indices = torch.topk(att,self.k)                        # finding pixel indices with top k attention scores

if phase == ‘train’:

if att.requires_grad:

    att.retain_grad()

ind_exp = indices.unsqueeze(-1).expand(b,self.k,self.inp)             



l_perm = x.permute(0,2,3,1)                                 #l_perm = b,8,8,512

l_perm = l_perm.reshape(b,h*w,self.inp)                     #l_perm = b,h*w,512

feat = torch.gather(l_perm,1,ind_exp)                       #feat = b,k,512 using indices found earlier to extract feature maps[Should work now]

feat = feat.permute(0,2,1)

return feat                                                 #feat = b,512,k

Hi! I have written the above code. I am basically selecting some indices from att tensor having highest values using topk function, and then using those indices to select features feat from l_perm tensor. I have also used att.retain_grad(). Now while training after one round of loss.backward() when I print .grad for delft.conv, it shows None. I think my network has problem in flowing gradients due to the index selection part. Can someone please let me know the problem as well as the solution to this? Thanks.

@ptrblck can you please help? I am stuck because of this.