Now, my network has two branches, one of which is the normal ResNet50 and the other branch is forked from the third convolution block of ResNet50. In the latter branch, I set some operations, one of which is as follows.
# https://github.com/yanx27/3DGNN_pytorch/blob/b5e0188e56f926cff9b2c9a68bedf42cb3b42d2f/models.py#L340
# adapted from https://discuss.pytorch.org/t/build-your-own-loss-function-in-pytorch/235/6
# (x - y)^2 = x^2 - 2*x*y + y^2
def get_knn_indices(self, batch_mat, k):
r = torch.bmm(batch_mat, batch_mat.permute(0, 2, 1))
N = r.size()[0]
HW = r.size()[1]
if self.use_gpu:
batch_indices = torch.zeros((N, HW, k)).cuda()
else:
batch_indices = torch.zeros((N, HW, k))
for idx, val in enumerate(r):
# get the diagonal elements
diag = val.diag().unsqueeze(0)
diag = diag.expand_as(val)
# compute the distance matrix
D = (diag + diag.t() - 2 * val).sqrt()
topk, indices = torch.topk(D, k=k, largest=False)
batch_indices[idx] = indices.data
return batch_indices
I think __get_knn_indices is non-differentiable, which may cause the parameters of this branch not to be updated when backpropagating.
So, I have some questions:
How can I tell if my thoughts are correct?
Is there a way to visualize the gradient path of the back propagation of the entire network? If there is any, it is estimated that my first problem can be easily solved.
I am working on implementing this as well. At what point during the training should you check for the gradient?
Currently, I am checking at the end of each epoch by iterating through my models parameters and calling the variable .grad As shown in code below. However, for some reason when I visualize it in Tensorboard all my layers have zero gradients, even though the histograms show that the weights and bias are changing.
loss.backward()
optimizer.step()
optimizer.zero_grad()
for tag, parm in model.named_parameters:
writer.add_histogram(tag, parm.grad.data.cpu().numpy(), epoch)