I am very excited to see a library supported implementation of Faster RCNN … and COCO dataset wrappers… however I cannot get mine to train
tutorial : https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html
repo : https://github.com/EMCP/torchvision-faster-rcnn/blob/master/model_components/training.py
It seems that I need to fix a problem where there is a NAN loss… and I am wondering if the “mask” part of the coco data is to blame
I was stuck trying to get resnet 101 pretrained on here… instead of the smaller network… but that didn’t take either
/home/emcp/anaconda3/envs/pytorch_150/bin/python /home/emcp/Dev/git/EMCP/faster-rcnn-torchvision/model_components/training.py
Using hyperParameters:
{'hyperParameters': {'anchor_ratios': [0.5, 1, 2],
'anchor_scales': [4, 8, 16, 32],
'batch_size': 1,
'bbox_inside_weights': [1.0, 1.0, 1.0, 1.0],
'bbox_normalize_means': [0.0, 0.0, 0.0, 0.0],
'bbox_normalize_stds': [0.1, 0.1, 0.2, 0.2],
'bbox_normalize_targets_precomputed': True,
'bg_threshold_high': 0.5,
'bg_threshold_low': 0.0,
'bias_decay': False,
'checkpoint_interval': 10000,
'crop_resize_with_max_pool': False,
'display': 20,
'display_interval': 100,
'double_bias': False,
'epoch_max': 100,
'epoch_start': 0,
'feat_stride': [16],
'fg_fraction': 0.25,
'fg_threshold': 0.5,
'has_rpn': True,
'learning_decay_gamma': 0.1,
'learning_decay_step': 3,
'learning_rate': 0.005,
'learning_weight_decay': 0.0005,
'max_num_gt_boxes': 50,
'max_size': 1000,
'momentum': 0.9,
'net': 'res101',
'optimizer': 'sgd',
'pixel_means': [[[102.9801, 115.9465, 122.7717]]],
'pooling_mode': 'align',
'pooling_size': 7,
'pre_trained_model_path': '/home/emcp/Dev/git/EMCP/faster-rcnn-pytorch-native/model_components/models_pretrained/resnet101_caffe.pth',
'proposal_method': 'gt',
'random_seed': 3,
'rpn_batch_size': 256,
'rpn_bbox_inside_weights': [1.0, 1.0, 1.0, 1.0],
'rpn_clobber_positives': False,
'rpn_fg_fraction': 0.5,
'rpn_min_size': 8,
'rpn_negative_overlap': 0.3,
'rpn_nms_thresh': 0.7,
'rpn_positive_overlap': 0.7,
'rpn_positive_weight': -1.0,
'rpn_post_nms_top_n': 2000,
'rpn_pre_nms_top_n': 12000,
'scales': [1394],
'testing': {'bbox_reg': True,
'check_epoch': 1,
'check_point': 477,
'check_session': 1,
'enable_visualization': True,
'has_rpn': True,
'max_size': 400,
'nms': 0.3,
'parallelization_mode': 0,
'rpn_post_nms_top_n': 1000,
'scales': [400]},
'trim_height': 600,
'trim_width': 600,
'truncated': False,
'use_all_gt': True,
'use_class_agnostic_regression': False,
'use_flipped': True,
'use_pretrained_net': True},
'pytorch_engine': {'enable_cuda': True,
'enable_multiple_gpus': False,
'enable_tfb': True,
'num_workers': 1,
'resume_checkpoint': False,
'resume_checkpoint_epoch': 1,
'resume_checkpoint_num': 0,
'resume_checkpoint_session': 1,
'session': 1},
'resnet': {'fixed_blocks': 1}}
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
.
/home/emcp/anaconda3/envs/pytorch_150/lib/python3.7/site-packages/torch/nn/functional.py:2854: UserWarning: The default behavior for interpolate/upsample with float scale_factor will change in 1.6.0 to align with other frameworks/libraries, and use scale_factor directly, instead of relying on the computed output size. If you wish to keep the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
warnings.warn("The default behavior for interpolate/upsample with float scale_factor will change "
/opt/conda/conda-bld/pytorch_1587428266983/work/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of nonzero is deprecated:
nonzero(Tensor input, *, Tensor out)
Consider using one of the following signatures instead:
nonzero(Tensor input, *, bool as_tuple)
Loss is nan, stopping training
{'loss_classifier': tensor(2.9877, device='cuda:0', grad_fn=<NllLossBackward>), 'loss_box_reg': tensor(0., device='cuda:0', grad_fn=<DivBackward0>), 'loss_objectness': tensor(17.7084, device='cuda:0',
grad_fn=<BinaryCrossEntropyWithLogitsBackward>), 'loss_rpn_box_reg': tensor(nan, device='cuda:0', grad_fn=<DivBackward0>)}
Process finished with exit code 1