Okay, I think I have narrowed down the problem and realise your answer is pertinent @ptrblck. The network takes in images of varying sizes and not a fixed input, causing the memory to fluctuate.
I have managed to train the network on V100 GPU and here are the output from first two(200*2) iterations,
Epoch: [1] [ 0/7925] eta: 6:09:17 lr: 0.000500 loss: 2.0259 (2.0259) loss_classifier: 0.6635 (0.6635) loss_box_reg: 0.0236 (0.0236) loss_objectness: 0.6907 (0.6907) loss_rpn_box_reg: 0.6483 (0.6483) time: 2.7958 data: 1.1561 max mem: 7464
Epoch: [1] [ 200/7925] eta: 1:25:06 lr: 0.000500 loss: 1.0027 (1.3102) loss_classifier: 0.1189 (0.2118) loss_box_reg: 0.0081 (0.0198) loss_objectness: 0.5959 (0.6495) loss_rpn_box_reg: 0.3003 (0.4291) time: 0.6534 data: 0.0127 max mem: 10348
I have opened an issue this morning, a proposal to remove GeneralisedTransformation
as a compulsory transformation to all FasterRCNN models and I believe fixed size inputs would keep the memory from fluctuating.
Thanks again!