Hi folks,
While training a Faster RCNN for which the code is available here, I am facing following problem. I guess there is some problem with the input or the target. But what is the problem exactly, I am unable to understand that. Please give suggestions.
Loss is nan, stopping training
{'loss_classifier': tensor(0.0168, grad_fn=<NllLossBackward0>), 'loss_box_reg': tensor(0.0019, grad_fn=<DivBackward0>), 'loss_objectness': tensor(nan, grad_fn=<BinaryCrossEntropyWithLogitsBackward0>), 'loss_rpn_box_reg': tensor(0.3363, grad_fn=<DivBackward0>)}
An exception has occurred, use %tb to see the full traceback.
SystemExit: 1
Training images are normalised and the values lie between 0 and 1. However, Targets are in the form of Dictionary. Following are details about the sample image patch and the targets before feeding to the model. Patch’s histogram is attached as well.
targets keys dict_keys([‘boxes’, ‘labels’, ‘length_labels’, ‘scene_id’, ‘chip_id’, ‘image_id’, ‘area’, ‘iscrowd’])
targets boxes tensor([[226., 681., 236., 691.],
[495., 19., 505., 29.],
[495., 12., 505., 22.],
[704., 704., 714., 714.],
[703., 407., 713., 417.],
[345., 749., 355., 759.],
[336., 700., 346., 710.],
[766., 712., 776., 722.]])
targets labels tensor([1, 3, 3, 1, 1, 1, 2, 1])
targets scene_id 590dd08f71056cacv
targets chip_id tensor(858)
targets image_id tensor(32)
targets area tensor([100., 100., 100., 100., 100., 100., 100., 100.])
targets iscrowd tensor([0, 0, 0, 0, 0, 0, 0, 0])
torch.Size([3, 800, 800])
torch.float32
tensor(0.) tensor(1.)