Hello,
I’m working with the facebook implementation of faster rcnn https://github.com/facebookresearch/maskrcnn-benchmark.
In this implementation, the classification/regression are defined in https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/maskrcnn_benchmark/modeling/roi_heads/box_head/roi_box_predictors.py
I would like to do multi labels classification (assume that we have N classes), so I chose to parallelize N binary classifiers (N times the cited predictor) like this.
class Roi_head(nn.Module):
def __init__(self, cfg, args):
super(Roi_head, self).__init__()
self.classes_number = N
for i in range(1, self.classes_number +1):
setattr(self, "roi_heads%d" % i, build_roi_heads(cfg))
def forward(self, features, proposals, targets=None):
labels = targets[0].get_field("labels")
for i in range(1, self.classes_number +1):
if torch.nonzero(labels == i).squeeze(1).shape[0] != 0:
device = labels.device
label = torch.ones(labels.shape, dtype=torch.float32, device=device)
targets[0].add_field("labels", label)
x, result, detector_losse = getattr(self, "roi_heads%d" % i)(features, proposals, targets)
try:
detector_losses += sum(loss for loss in detector_losse.values())
except:
detector_losses = sum(loss for loss in detector_losse.values())
But when I try to train this, I sometimes have an uncomprehending GPU memory increase (about 100mo) during backward (not for all iterations) leading to a memory overflow…
Do you have any ideas that could help me to understand what causes this leak during backward?
Thanks