MaskRCNN crashes silently when predicting

Nuno-Mota · April 30, 2020, 2:59pm

Hi

When running Pytorch’s MaskRCNN implementation in eval mode the program crashes ‘silently’ (error: ‘zsh: killed python <program> <args>’.

I’ve traced the problem back to torchvision.models.detection.roi_heads on function paste_masks_in_image(masks, boxes, img_shape, padding=1):

def paste_masks_in_image(masks, boxes, img_shape, padding=1):
    # type: (Tensor, Tensor, Tuple[int, int], int)
    masks, scale = expand_masks(masks, padding=padding)
    boxes = expand_boxes(boxes, scale).to(dtype=torch.int64)
    im_h, im_w = img_shape

    if torchvision._is_tracing():
        return _onnx_paste_masks_in_image_loop(masks, boxes,
                                               torch.scalar_tensor(im_h, dtype=torch.int64),
                                               torch.scalar_tensor(im_w, dtype=torch.int64))[:, None]
    res = [
        paste_mask_in_image(m[0], b, im_h, im_w)
        for m, b in zip(masks, boxes)
    ]
    if len(res) > 0:
        ret = torch.stack(res, dim=0)[:, None]
    else:
        ret = masks.new_empty((0, 1, im_h, im_w))
    return ret

In particular the problem seems to be that when res has a lot of elements, the whole program crashes due to lack of memory:

res = [
    paste_mask_in_image(m[0], b, im_h, im_w)
    for m, b in zip(masks, boxes)
]
if len(res) > 0:
    ret = torch.stack(res, dim=0)[:, None]

For example, artificially reducing the number of results, by replacing the last line, torch.stack(…) by the following solves the problem (although obviously yielding incomplete results):

res = [
    paste_mask_in_image(m[0], b, im_h, im_w)
    for m, b in zip(masks, boxes)
]
if len(res) > 0:
    ret = torch.stack(res[:2], dim=0)[:, None]

My main point is that it would be nice to have some kind of verbose error, if at all possible.

In my case, having 16GB of RAM in my machine, but with images images having a size 2054x2456 and quite a lot of memory consumption by other programs, I believe the best solution is to simply reduce the value of the MaskRCNN’s box_detections_per_img parameter.

Maybe someone knows of some other possible workaround?

Cheers