Hello all,
I am trying to implement the bounding box feature in the object detection finetuning tutorial where my interest is just to use the FastRCNNPredictor part, so I erased unwanted part from the code and now the model definition is as follows,
import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
def get_model_instance_segmentation(num_classes):
model = torchvision.models.detection.maskrcnn_resnet50_fpn(weights="DEFAULT")
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
return model
the code runs 10 epochs smoothly and then the code for the bounding box is written as,
from torchvision.utils import draw_bounding_boxes
img, _ = dataset_test[0]
# put the model in evaluation mode
model.eval()
with torch.no_grad():
prediction = model([img.to(device)])
width=4
box1=prediction[0]['boxes'][0,:].byte().cpu().numpy()
box2=prediction[0]['boxes'][1,:].byte().cpu().numpy()
box3=prediction[0]['boxes'][2,:].byte().cpu().numpy()
box4=prediction[0]['boxes'][3,:].byte().cpu().numpy()
box5=prediction[0]['boxes'][4,:].byte().cpu().numpy()
box6=prediction[0]['boxes'][5,:].byte().cpu().numpy()
box7=prediction[0]['boxes'][6,:].byte().cpu().numpy()
box8=prediction[0]['boxes'][7,:].byte().cpu().numpy()
box9=prediction[0]['boxes'][8,:].byte().cpu().numpy()
box10=prediction[0]['boxes'][9,:].byte().cpu().numpy()
boxes=[box1, box2, box3, box4, box5, box6, box7, box8, box9, box10]
boxes = torch.tensor(boxes, dtype=torch.int)
#boxes = boxes.unsqueeze(0)
img=img*255
img=torch.tensor(img,dtype=torch.uint8)
labels=prediction[0]['labels'].byte().cpu().numpy()
img_with_boxes = draw_bounding_boxes(img, boxes, width, labels)
img_with_boxes = torchvision.transforms.ToPILImage()(img_with_boxes)
img_with_boxes.show()
fig=show(img_with_boxes)
fig.save(os.path.join('./output', 'img_with_boxes.jpg'))
Here in the prediction, it shows 10 number of boxes therefore I assign 10 boxes, the prediction is as follows,
> [{'boxes': tensor([[ 64.6042, 39.6047, 194.9972, 323.6084],
> [182.3480, 23.9118, 274.2413, 333.0615],
> [174.8589, 30.9284, 260.7455, 207.5839],
> [124.9133, 34.3869, 228.6437, 319.2509],
> [124.1383, 30.9033, 193.8206, 234.9681],
> [ 12.3998, 0.0000, 100.9321, 337.0988],
> [ 72.6281, 39.6080, 152.4279, 240.3065],
> [ 47.6792, 37.2603, 118.6936, 216.3365],
> [152.3832, 32.2648, 222.9659, 213.8243],
> [213.9930, 27.7466, 268.4226, 250.7414]], device='cuda:0'), 'labels': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], device='cuda:0'), 'scores': tensor([0.9645, 0.5296, 0.2010, 0.1875, 0.1495, 0.1228, 0.0854, 0.0563, 0.0552,
> 0.0525], device='cuda:0')}]
Above are the only changes I made in the tutorial code. After running, I get an error as follows,
Traceback (most recent call last):
File “/./py.py”, line ***, in
img_with_boxes = draw_bounding_boxes(img, boxes, width, labels)
File “/…/torch/autograd/grad_mode.py”, line 27, in decorate_context
return func(*args, **kwargs)
File “/…/torchvision/utils.py”, line 212, in draw_bounding_boxes
elif len(labels) != num_boxes:
TypeError: object of type ‘int’ has no len()
I am pretty sure something might be wrong in the coding of the bounding box as I get the error above, in addition, the image which is picked from dataset_test has only one object to detect, however, the prediction indicates 10 boxes. It would be really helpful if I could get advice/suggestions/hints to get me through the problem.
Thank you in advance and regards!