FasterRCNN - half the batch disappears when using DataParallel

I trained a FasterRCNN model with a single GPU. At inference time, I’d like to use multi-GPU prediction. However, I find that for a batch of say 50 images, the model’s output only has 25 predictions.

Here’s a part of my code

    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False)

    N_CLASSES = 2 # 1 + background

    INP_FEATURES = model.roi_heads.box_predictor.cls_score.in_features

    model.roi_heads.box_predictor = FastRCNNPredictor(INP_FEATURES, N_CLASSES)

    device = torch.device('cuda') if torch.cuda.is_available() else torch.device

    if torch.cuda.device_count() > 1:
        model = nn.DataParallel(model)
    print("Using", torch.cuda.device_count(), "GPUs")

    model.to(device)
    checkpoint = torch.load(args.pretrained_path)

    if torch.cuda.device_count() > 1:
        model.module.load_state_dict(checkpoint['model_state_dict'])
    else:
        model.load_state_dict(checkpoint['model_state_dict'])

Any clues?

Thanks!