class delft_block(nn.Module):

def **init**(self,input_dimensions,k):

```
super(delft_block,self).__init__()
#self.conv1 = nn.Conv2d(input_dim, 512, kernel_size=kernel_size, padding=padding, stride=stride)
#self.activate = nn.ReLU()
self.k = k
self.inp = input_dimensions
#self.conv1 = nn.Conv2d(input_dimensions,512,kernel_size=3,padding=1)
self.conv2 = nn.Conv2d(input_dimensions,1, kernel_size=1, padding=0, stride=1)
```

def forward(self,x): #x = b,512,8,8

```
b,c,h,w = x.size()
out = self.conv2(x) #x = b,1,8,8
prob = nn.Softplus()(out)
att = F.softmax(prob.view(b,-1),dim=1) #b,8*8(pixel wise attention scores flattened)
val,indices = torch.topk(att,self.k) # finding pixel indices with top k attention scores
```

# if phase == ‘train’:

```
if att.requires_grad:
att.retain_grad()
ind_exp = indices.unsqueeze(-1).expand(b,self.k,self.inp)
l_perm = x.permute(0,2,3,1) #l_perm = b,8,8,512
l_perm = l_perm.reshape(b,h*w,self.inp) #l_perm = b,h*w,512
feat = torch.gather(l_perm,1,ind_exp) #feat = b,k,512 using indices found earlier to extract feature maps[Should work now]
feat = feat.permute(0,2,1)
return feat #feat = b,512,k
```

Hi! I have written the above code. I am basically selecting some indices from att tensor having highest values using topk function, and then using those indices to select features feat from l_perm tensor. I have also used att.retain_grad(). Now while training after one round of loss.backward() when I print .grad for delft.conv, it shows None. I think my network has problem in flowing gradients due to the index selection part. Can someone please let me know the problem as well as the solution to this? Thanks.