Torchvision - Faster RCNN - Empty Training Images

Hi,

I am using the Torchvision Faster RCNN model for Object detection.
I am trying to train the model with my own dataset and followed the tutorial on: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html

Now I am face a problem how to deal training images without an object to detect.
At the moment I am using

__getitem__():
    boxes = torch.as_tensor(boxes, dtype=torch.float32)
    labels = torch.as_tensor(labels, dtype=torch.int64)
    labels+=1
    image_id = torch.tensor([idx])
    if num_objs==0:
        area = torch.tensor([0]) #also tried torch.tensor([]) 
    else:
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
    iscrowd = torch.zeros((num_objs,), dtype=torch.int64)
    target = {}
    target["boxes"] = boxes
    target["labels"] = labels
    target["image_id"] = image_id
    target["area"] = area
    target["iscrowd"] = iscrowd
    if self.transforms is not None:
        img, target = self.transforms(img, target)

    return img, target

in combination with the engine in references/detection/engine.py.
Now if, a empty image is selected I get the following error message:
Traceback (most recent call last):

File “/home/user/Detection_Experiments/Model_Base.py”, line 155, in
model, optimizer, loss, lr_scheduler=main()
File “/home/user/Detection_Experiments/Model_Base.py”, line 146, in main
loss=train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
File “/home/user/Detection_Experiments/references/detection/engine.py”, line 30, in train_one_epoch
loss_dict = model(images, targets)
File “/home/user/.virtualenvs/test_p3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 493, in call
result = self.forward(*input, **kwargs)
File “/home/user/.virtualenvs/test_p3/lib/python3.6/site-packages/torchvision-0.3.0-py3.6-linux-x86_64.egg/torchvision/models/detection/generalized_rcnn.py”, line 47, in forward
File “/home/user/.virtualenvs/test_p3/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 493, in call
result = self.forward(*input, **kwargs)
File “/home/user/.virtualenvs/test_p3/lib/python3.6/site-packages/torchvision-0.3.0-py3.6-linux-x86_64.egg/torchvision/models/detection/transform.py”, line 40, in forward
File “/home/user/.virtualenvs/test_p3/lib/python3.6/site-packages/torchvision-0.3.0-py3.6-linux-x86_64.egg/torchvision/models/detection/transform.py”, line 74, in resize
File “/home/user/.virtualenvs/test_p3/lib/python3.6/site-packages/torchvision-0.3.0-py3.6-linux-x86_64.egg/torchvision/models/detection/transform.py”, line 135, in resize_boxes
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

Can anyone tell my how my tensors should look I want to use an empty image, using the standart Torchvision Faster RCNN?

Thank you in advance :slight_smile:

3 Likes

I am also trying to implement the mask RCNN following this tutorial. May i ask, where did you download the references/detection/engine.py? i just can’t find it anywhere.

Hi,

its in the torchvision github repo:

Having exactly the same question…did you find a solution yet? Does it even make sense to train on images with no object at all?

Does anyone have a working training script with the torchvision faster rcnn implementation?
I am trying to train from scratch with coco but I keep getting issues with tensor sizes in the roi code.

原链接在TorchVision Object Detection Finetuning Tutorial的 Colab Version.中有,是在https://github.com/pytorch/vision/tree/v0.3.0/references/detection这里。
不过,cocoapi更新后,有一点不兼容,会出现楼主的类似报错。
我把改好的放在下面的百度网盘里了
链接:https://pan.baidu.com/s/1DCw_ApFSVlcxqoCILLc2tg
提取码:jihx

可以试试这个。

Translated using Google Translate:

The original link is available in the Colab Version of the TorchVision Object Detection Finetuning Tutorial at https://github.com/pytorch/vision/tree/v0.3.0/references/detection.
However, after the cocoapi update, there is a bit of incompatibility, there will be a similar error from the landlord.
I put the change in the Baidu network disk below.
Link: https://pan.baidu.com/s/1DCw_ApFSVlcxqoCILLc2tg
Extraction code: jihx
You can try this.

@FrankZLuffy I would like to ask you to translate your posts before posting :slight_smile:

I’m having the same issue. For now I’m dynamically removing images from my batches that do not contain any objects using a custom collate function but I guess it would be beneficial to include these images too in the training loop.

Any help would be appreciated.

1 Like

It seems that it is not possible to use image without annotation for the training: https://github.com/pytorch/vision/issues/1128
I’ve tried to pass boxes of size 0 but then ‘loss_rpn_box_reg’ becomes inf.

2 Likes

That’s because the size 0 box has width of 0, which leads to inf for torch.log(gt_widths / ex_widths).

I have the same question as how to deal with images without objects. Is there any standard process? Shall we 1. filter out images without desired objects; or 2. is there a smart way for that?

I used a hacky way by always including a bounding box with label “0” for background since the Pytorch Tutorial mentioned it requires a background class and a person class. Hope this helps

Hi @TriEightz,

Could you elaborate what values you added to your bouding box with label “0” for a background image ?

I’ve tried setting an all zero tensor for instance [0, 0, 0, 0] as a bounding box for images with no objects and have set the label as “0” but recieved the same error as @Seba.
As tengerye pointed out, option 1 is currently being used even in detectron2.

Any explanation that you or anyone else could provide on how to handle negative images (i.e. images with only background and no object) would be really appreciated.

@JF7 I set the values to be (0,1,2,3) for label “0”. so the box actually does have a size, but so far it seems to work fine as it represents the background class and doesn’t affect training.

Hi @TriEightz,

Thanks for the explanation. Much appreciated.

Did you happen to notice any improvement in overall performance of the model when you add label “0” only images to the training set?

What’s the current solution to this? Is the following proposed solution still the best?