Is there a way to visualize the gradient path of the back propagation of the entire network

I-Love-U · May 4, 2019, 2:50am

Hello.

Now, my network has two branches, one of which is the normal ResNet50 and the other branch is forked from the third convolution block of ResNet50. In the latter branch, I set some operations, one of which is as follows.

# https://github.com/yanx27/3DGNN_pytorch/blob/b5e0188e56f926cff9b2c9a68bedf42cb3b42d2f/models.py#L340

    # adapted from https://discuss.pytorch.org/t/build-your-own-loss-function-in-pytorch/235/6
    # (x - y)^2 = x^2 - 2*x*y + y^2
    def get_knn_indices(self, batch_mat, k):
        r = torch.bmm(batch_mat, batch_mat.permute(0, 2, 1)) 
        N = r.size()[0]
        HW = r.size()[1]
        if self.use_gpu:
            batch_indices = torch.zeros((N, HW, k)).cuda()
        else:
            batch_indices = torch.zeros((N, HW, k))
        for idx, val in enumerate(r):
            # get the diagonal elements
            diag = val.diag().unsqueeze(0)
            diag = diag.expand_as(val)
            # compute the distance matrix
            D = (diag + diag.t() - 2 * val).sqrt()
            topk, indices = torch.topk(D, k=k, largest=False)
            batch_indices[idx] = indices.data
        return batch_indices

I think __get_knn_indices is non-differentiable, which may cause the parameters of this branch not to be updated when backpropagating.

So, I have some questions:

How can I tell if my thoughts are correct?
Is there a way to visualize the gradient path of the back propagation of the entire network? If there is any, it is estimated that my first problem can be easily solved.

JuanFMontesinos · May 7, 2019, 7:56pm

Meh, there are some works which allows you to plot the graph but you can get masive huge graphs.

Why don’t you just visualize gradients? if there is a detach you will see None instead of a number

I-Love-U · May 9, 2019, 3:28am

Thanks for your reply.

According to your advice, I find tensorboardX can do it:

I’ll try it. Thank you again.

drb12 · May 20, 2019, 2:39am

I am working on implementing this as well. At what point during the training should you check for the gradient?

Currently, I am checking at the end of each epoch by iterating through my models parameters and calling the variable .grad As shown in code below. However, for some reason when I visualize it in Tensorboard all my layers have zero gradients, even though the histograms show that the weights and bias are changing.

loss.backward()
optimizer.step()
optimizer.zero_grad()
for tag, parm in model.named_parameters:
     writer.add_histogram(tag, parm.grad.data.cpu().numpy(), epoch)

I-Love-U · May 22, 2019, 9:32am

I think you’re saying that this method is not accurate for observing gradients.
I hope I get it right.

I haven’t tried yet, but the question you’re talking about is really interesting, so here’s the question. How should we observe the gradient?

drevicko · January 1, 2020, 11:41pm

Perhaps you should be calling add_histogram before .zero_grad() … Calling it after, I’d expect to get zeros!

dmitrysarov · June 26, 2020, 4:22pm

@drb12 you have a typo in your code.

model.named_parameters

is a method, so to iterate over parameters you should call it

model.named_parameters()

sharjeel · April 4, 2025, 12:27pm

you can use BBDecoder to visualize model gradients along with other metrics necessary for analyzing the model.

from BBDecoder import Master_analyzer

model = resnet50().to(device)
wrapped_model = Master_analyzer(model, save_path, input_size)

for data in tqdm(testloader):
    inputs, labels = data
    inputs, labels = inputs.to(device), labels.to(device)
    
    outputs = wrapped_model.forward_propagation(inputs)
    loss = F.cross_entropy(outputs, labels)

    optimizer.zero_grad()
    wrapped_model.backward_propagation(loss, collect_grads = True, layer_inds = [0, 1, 2, 3, 4, 5, 6])
    optimizer.step()
    losses.append(loss.item())
    
print('Epoch: ', epoch)
print('Average Loss: ', sum(losses)/len(losses))
wrapped_model.save_collected_grads(ep = epoch, save_folder)