Hello all,
I am working on trying to generate some attributions maps for YoloV8. I was able to achieve this on the YoloV7 relatively easily.
Trying on YoloV8 I seem to always get a tensor full of NaNs, which is odd as the general process I conducting is similar to how I did it in YoloV7. The architecture between them has not changed much, but the code has as YoloV8 is Ultralytics while YoloV7 was a different company.
This is what I’ve tried:
- Use detect_anomaly to find the root cause of the nan’s during the backdrop. Torch states:
'ConvolutionBackward0' returned nan values in its 0th output.
. Which is the initial layer. I checked the inputs for NaNs as well and they are clean. - Double check gradient creation method to make sure its correct per my YoloV7 and common vanilla gradient attribution methods.
- Debug the YoloV8 Forward pass for any weird manipulation of the input data. None that I could find.
- Simply the code down to a minimal example to verify and reproduce the issue.
My Code:
from ultralytics.models.yolo.detect import DetectionTrainer
from ultralytics.nn.tasks import DetectionModel
from ultralytics.utils import (RANK)
import torch
class CustomModel(DetectionModel):
def loss(self, batch, preds=None):
if not hasattr(self, 'criterion'):
self.criterion = self.init_criterion()
imgs = batch['img'].requires_grad_(True)
preds = self.forward(imgs) if preds is None else preds
loss = self.criterion(preds, batch)
pred_scores = self.get_pred_scores(preds)
# Using autograd so I can change w.r.t target, so can test w.r.t loss, obj_loss, etc..
gradients = torch.autograd.grad(pred_scores,
imgs,
grad_outputs=torch.ones_like(pred_scores),
retain_graph=True)[0]
# The rest of the attribution method would go here (Assume for now its just vanilla gradient attribution
return loss
def get_pred_scores(self, preds):
# This is ripped from the v8DetectionLoss
feats = preds[1] if isinstance(preds, tuple) else preds
_, pred_scores = torch.cat([xi.view(feats[0].shape[0], self.model[-1].no, -1) for xi in feats], 2).split(
(self.model[-1].reg_max * 4, self.model[-1].nc), 1)
pred_scores = pred_scores.permute(0, 2, 1).contiguous()
return pred_scores
class CustomTrainer(DetectionTrainer):
def get_model(self, cfg=None, weights=None, verbose=True):
"""Return a YOLO detection model."""
model = CustomModel(cfg, nc=self.data['nc'], verbose=verbose and RANK == -1)
if weights:
model.load(weights)
return model
Assume this CustomTrainer
is passed into the model.train method and ran.
Luckily way less code then in YoloV7 as Ultralytics does a good job with abstraction!