I’m doing detection problem with cascade rcnn with detectron2.
Testing the augmentation right now.
T.RandomFlip(prob=0.5, horizontal=True, vertical=False),
T.ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice')
with this two options it has no problem.
but when I add the
Preformatted text ,T.RandomCrop('relative_range', (0.4, 0.5))
the error occured
FloatingPointError: Loss became infinite or NaN at iteration=1!
loss_dict = {'loss_cls_stage0': 1.613979458808899, 'loss_box_reg_stage0': 0.011088866740465164, 'loss_cls_stage1': 1.668488621711731, 'loss_box_reg_stage1': 0.00575869157910347, 'loss_cls_stage2': 1.4664241075515747, 'loss_box_reg_stage2': 0.0028667401056736708, 'loss_rpn_cls': 1.469098448753357, 'loss_rpn_loc': inf}
[11/21 06:53:37 d2.engine.hooks]: Total training time: 0:00:00 (0:00:00 on hooks)
[11/21 06:53:37 d2.utils.events]: iter: 1 total_loss: 4.9 loss_cls_stage0: 1.609 loss_box_reg_stage0: 0.005077 loss_cls_stage1: 1.685 loss_box_reg_stage1: 0.002878 loss_cls_stage2: 1.464 loss_box_reg_stage2: 0.002352 loss_rpn_cls: 0.09238 loss_rpn_loc: 0.03937 data_time: 0.1260 lr: 2e-05 max_mem: 5386M
My initial value of parameter is as follow
cfg.DATALOADER.NUM_WORKERS = 4
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("Misc/cascade_mask_rcnn_R_50_FPN_3x.yaml") # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 6#6
cfg.SOLVER.MAX_ITER = 15000 # Need to train longer
#cfg.SOLVER.CHECKPOINT_PERIOD = 1000
cfg.MODEL.RESNETS.DEPTH = 50 #RESNET 50,101
#RESNEXT parameters
cfg.SOLVER.CHECKPOINT_PERIOD = 200
cfg.MODEL.ROI_HEADS.NAME= "CascadeROIHeads"
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 4 # 4 classes
cfg.MODEL.MASK_ON = False
cfg.OUTPUT_DIR = save_path
cfg.SOLVER.LR_SCHEDULER_NAME = 'WarmupCosineLR'
cfg.SOLVER.BASE_LS = 0.0001