Hi PyTorch community,
I’m seeking clarification on the inner workings of the forward and backward processes when dealing with a single forward pass and multiple backward passes and how gradients are computed.
Context :
I am currently working on a CNN model with multiple outputs (num outputs = num bounding boxes)
I am trying to compute the gradients of a specific detection’s top_score wrt sub-module/layer 's output using module backward hooks.
Importantly, these gradients are intended for debugging purposes and enhancing model explainability, rather than for training or updating model params, thus, I will not be using optimizers and inputs will be non-batched (batch_size = 1)
My specific questions are:
-
When and how are gradients computed exactly in PyTorch during the forward and backward passes?
-
If I perform one forward pass and multiple backward passes (e.g.,
top_scores[bbox_index].backward()
fori
in range(num_detections)), how does PyTorch handle this internally?for bbox_index in range(num_boxes): # the model outputs a scores tensor of shape (n_boxes, n_classes) # top_scores is a tensor computed from the model output tensor # so it should be attached to the computation graph # it contains the top score for each bbox top_scores[bbox_index].backward(retain_graph=True)
-
Do gradients accumulate if I perform
top_scores[1].backward(retain_graph=True)
followed bytop_scores[2].backward(retain_graph=True)
withretain_graph=True
? Doesretain_graph
keep gradient values or just the computation graph ? -
Do I need to set gradients to 0 after computing each
top_scores[i].backward(..)
? If so, when is the appropriate time to do this?
I’d appreciate any insights !