Hello
Loss is nan error occurs when I learn fast rcnn with resnext101 backbone
My code is as follows
backbone = resnet_fpn_backbone('resnext101_32x8d', pretrained=True)
model = FasterRCNN(backbone, num_classes)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
error message
Epoch: [0] [ 0/7208] eta: 1:27:42 lr: 0.000040 loss: 40613806080.0000 (40613806080.0000) loss_box_reg: 7979147264.0000 (7979147264.0000) loss_classifier: 11993160704.0000 (11993160704.0000) loss_objectness: 9486380032.0000 (9486380032.0000) loss_rpn_box_reg: 11155118080.0000 (11155118080.0000) time: 0.7301 data: 0.4106 max mem: 1241
Loss is nan, stopping training
When i change the backbone to resnet50 and resnet152, no error occrus.
i try to reduce lr but error still occur
Epoch: [0] [ 0/7208] eta: 1:37:48 lr: 0.000000 loss: 34383663104.0000 (34383663104.0000) loss_box_reg: 8708817920.0000 (8708817920.0000) loss_classifier: 8583784448.0000 (8583784448.0000) loss_objectness: 7424135168.0000 (7424135168.0000) loss_rpn_box_reg: 9666924544.0000 (9666924544.0000) time: 0.8142 data: 0.4027 max mem: 1242
Epoch: [0] [ 10/7208] eta: 0:44:46 lr: 0.000000 loss: nan (nan) loss_box_reg: nan (nan) loss_classifier: nan (nan) loss_objectness: nan (nan) loss_rpn_box_reg: inf (nan) time: 0.3732 data: 0.0429 max mem: 1919