Nan loss appears only in the case of using wide_resnet_fpn or Resnext_fpn as a backbone whereas classic resnets with fpn are working properly as backbone in FRCNN. But the pytorch-vision has mentioned that we can use all of them in the below model . Any idea?
Error
Epoch: [0] [ 0/457] eta: 0:26:59 lr: 0.000032 loss: 2.0617 (2.0617) loss_classifier: 0.6785 (0.6785) loss_box_reg: 0.5333 (0.5333) loss_objectness: 0.6902 (0.6902) loss_rpn_box_reg: 0.1597 (0.1597) time: 3.5431 data: 0.6946 max mem: 6346
Loss is nan, stopping training
{'loss_classifier': tensor(nan, device='cuda:2', grad_fn=<NllLossBackward>), 'loss_box_reg': tensor(nan, device='cuda:2', grad_fn=<DivBackward0>), 'loss_objectness': tensor(nan, device='cuda:2', grad_fn=<BinaryCrossEntropyWithLogitsBackward>), 'loss_rpn_box_reg': tensor(nan, device='cuda:2', grad_fn=<DivBackward0>)}
Model
from torchvision.models.detection.retinanet import retinanet_resnet50_fpn
from torchvision.models.detection import FasterRCNN
def FRCNN_resnetfpn_backbone(backbone_name='resnet101', pre_trained=True):
# Reference: https://github.com/pytorch/vision/blob/master/torchvision/models/detection/backbone_utils.py
backbone = resnet_fpn_backbone(backbone_name, pre_trained)
"""
resnet_fpn_bacbone:
Args:
backbone_name (string): resnet architecture. Possible values are 'ResNet', 'resnet18', 'resnet34', 'resnet50',
'resnet101', 'resnet152', 'resnext50_32x4d', 'resnext101_32x8d', 'wide_resnet50_2', 'wide_resnet101_2'
pretrained (bool): If True, returns a model with backbone pre-trained on Imagenet
norm_layer (torchvision.ops): it is recommended to use the default value. For details visit:
(https://github.com/facebookresearch/maskrcnn-benchmark/issues/267)
trainable_layers (int): number of trainable (not frozen) resnet layers starting from final block.
Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. default=3
returned_layers (list of int): The layers of the network to return. Each entry must be in ``[1, 4]``.
By default all layers are returned.
extra_blocks (ExtraFPNBlock or None): if provided, extra operations will
be performed. It is expected to take the fpn features, the original
features and the names of the original features as input, and returns
a new list of feature maps and their corresponding names. By
default a ``LastLevelMaxPool`` is used.
"""
model = FasterRCNN(backbone,
num_classes=2)
return model
Thanks a lot! help!