Object detection finetuning tutorial | draw bounding box| FastRCNNPredictor

Hello all,

I am trying to implement the bounding box feature in the object detection finetuning tutorial where my interest is just to use the FastRCNNPredictor part, so I erased unwanted part from the code and now the model definition is as follows,

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
def get_model_instance_segmentation(num_classes):
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(weights="DEFAULT")
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
    return model

the code runs 10 epochs smoothly and then the code for the bounding box is written as,

from torchvision.utils import draw_bounding_boxes

img, _ = dataset_test[0]
# put the model in evaluation mode
model.eval()
with torch.no_grad():
    prediction = model([img.to(device)])

width=4
box1=prediction[0]['boxes'][0,:].byte().cpu().numpy()
box2=prediction[0]['boxes'][1,:].byte().cpu().numpy()
box3=prediction[0]['boxes'][2,:].byte().cpu().numpy()
box4=prediction[0]['boxes'][3,:].byte().cpu().numpy()
box5=prediction[0]['boxes'][4,:].byte().cpu().numpy()
box6=prediction[0]['boxes'][5,:].byte().cpu().numpy()
box7=prediction[0]['boxes'][6,:].byte().cpu().numpy()
box8=prediction[0]['boxes'][7,:].byte().cpu().numpy()
box9=prediction[0]['boxes'][8,:].byte().cpu().numpy()
box10=prediction[0]['boxes'][9,:].byte().cpu().numpy()
boxes=[box1, box2, box3, box4, box5, box6, box7, box8, box9, box10]
boxes = torch.tensor(boxes, dtype=torch.int)
#boxes = boxes.unsqueeze(0)
img=img*255
img=torch.tensor(img,dtype=torch.uint8)
labels=prediction[0]['labels'].byte().cpu().numpy()
img_with_boxes = draw_bounding_boxes(img, boxes, width, labels)
img_with_boxes = torchvision.transforms.ToPILImage()(img_with_boxes)
img_with_boxes.show()
fig=show(img_with_boxes)
fig.save(os.path.join('./output', 'img_with_boxes.jpg'))

Here in the prediction, it shows 10 number of boxes therefore I assign 10 boxes, the prediction is as follows,

> [{'boxes': tensor([[ 64.6042,  39.6047, 194.9972, 323.6084],
>         [182.3480,  23.9118, 274.2413, 333.0615],
>         [174.8589,  30.9284, 260.7455, 207.5839],
>         [124.9133,  34.3869, 228.6437, 319.2509],
>         [124.1383,  30.9033, 193.8206, 234.9681],
>         [ 12.3998,   0.0000, 100.9321, 337.0988],
>         [ 72.6281,  39.6080, 152.4279, 240.3065],
>         [ 47.6792,  37.2603, 118.6936, 216.3365],
>         [152.3832,  32.2648, 222.9659, 213.8243],
>         [213.9930,  27.7466, 268.4226, 250.7414]], device='cuda:0'), 'labels': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], device='cuda:0'), 'scores': tensor([0.9645, 0.5296, 0.2010, 0.1875, 0.1495, 0.1228, 0.0854, 0.0563, 0.0552,
>         0.0525], device='cuda:0')}]

Above are the only changes I made in the tutorial code. After running, I get an error as follows,

Traceback (most recent call last):
File “/./py.py”, line ***, in
img_with_boxes = draw_bounding_boxes(img, boxes, width, labels)
File “/…/torch/autograd/grad_mode.py”, line 27, in decorate_context
return func(*args, **kwargs)
File “/…/torchvision/utils.py”, line 212, in draw_bounding_boxes
elif len(labels) != num_boxes:
TypeError: object of type ‘int’ has no len()

I am pretty sure something might be wrong in the coding of the bounding box as I get the error above, in addition, the image which is picked from dataset_test has only one object to detect, however, the prediction indicates 10 boxes. It would be really helpful if I could get advice/suggestions/hints to get me through the problem.

Thank you in advance and regards!

labels seems to be an int here:

labels=prediction[0]['labels'].byte().cpu().numpy()
img_with_boxes = draw_bounding_boxes(img, boxes, width, labels)

while I guess an array, tensor, or list is expected which all would work with the len method.
Could you check the type of labels before you’ve changed the code and make sure it’s the same now?

greetings! thank you very much for your reply

…so far what I understand from the information in the tutorial the labels are in int form (0 for background and 1 for object). As I am a new bee, I am unfortunately unaware of the len method, however, I am very much excited to try this out in the tutorial. Could anyone direct me to implement this method in the same?

It would be also great to know if anyone tried out the bounding box feature in object detection finetuning tutorial of pytorch.

thank you in advance.

The len method calls into the built-in __len__ method of an object and returns its length.
E.g. if you call it with a list input, the number of elements (which is the length of the list) will be returned:

labels = [0, 1]
len(labels)
# 2

However, in your new code it seems that labels is now a single int value, which doesn’t implement this operator and raises the error:

a = 0
len(a)
# TypeError: object of type 'int' has no len()