torch.cuda.OutOfMemoryError when training Mask R-CNN

The model parameter and input size might be tiny compared to the stored intermediates needed for gradient computation.
Have a look at this post showing an example.