I saved a model’s state dict and am trying to load it back to perform inference on random images that weren’t in the training/validation dataset. Here’s my code:
VERSION = 1
ROOT = "some/path"
STATE_DICT_PATH = ROOT + 'state_dict.pth'
num_classes = 2
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
model = get_instance_segmentation_model(VERSION, num_classes)
model.load_state_dict(torch.load(STATE_DICT_PATH))
model.to(device)
img = Image.open(ROOT + "images/some_image.jpg")
img = F.pil_to_tensor(img)
img = F.convert_image_dtype(img)
model.eval()
with torch.no_grad():
out = model([img.to(device)])
The error I get: RuntimeError: The expanded size of the tensor (14204) must match the existing size (0) at non-singleton dimension 1. Target sizes: [10652, 14204]. Tensor sizes: [0, 0]
If however I rebuild the training dataset like in the tutorial:
it runs fine. I don’t understand why since in both cases I’m passing a PIL image converted to tensor to model… A comment: my original images are rather large (14000x10000 pixels) so I had them resized for training, but would like to predict on the full-size sources.
File "individualizador/predict.py", line 42, in <module>
out = model([img.to(device)])
File "/home/ims/.local/share/virtualenvs/individualizacao-BUoPwUHr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ims/.local/share/virtualenvs/individualizacao-BUoPwUHr/lib/python3.8/site-packages/torchvision/models/detection/generalized_rcnn.py", line 106, in forward
detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes) # type: ignore[operator]
File "/home/ims/.local/share/virtualenvs/individualizacao-BUoPwUHr/lib/python3.8/site-packages/torchvision/models/detection/transform.py", line 261, in postprocess
masks = paste_masks_in_image(masks, boxes, o_im_s)
File "/home/ims/.local/share/virtualenvs/individualizacao-BUoPwUHr/lib/python3.8/site-packages/torchvision/models/detection/roi_heads.py", line 484, in paste_masks_in_image
res = [paste_mask_in_image(m[0], b, im_h, im_w) for m, b in zip(masks, boxes)]
File "/home/ims/.local/share/virtualenvs/individualizacao-BUoPwUHr/lib/python3.8/site-packages/torchvision/models/detection/roi_heads.py", line 484, in <listcomp>
res = [paste_mask_in_image(m[0], b, im_h, im_w) for m, b in zip(masks, boxes)]
File "/home/ims/.local/share/virtualenvs/individualizacao-BUoPwUHr/lib/python3.8/site-packages/torchvision/models/detection/roi_heads.py", line 424, in paste_mask_in_image
im_mask[y_0:y_1, x_0:x_1] = mask[(y_0 - box[1]) : (y_1 - box[1]), (x_0 - box[0]) : (x_1 - box[0])]
RuntimeError: The expanded size of the tensor (14204) must match the existing size (0) at non-singleton dimension 1. Target sizes: [10652, 14204]. Tensor sizes: [0, 0]
@ptrblck definitely seems related to image size. Resizing the input with the same factor used for the dataset gets rid of the error. Did I train a model that can’t handle my full size images? I was running into RAM issues when dealing with the masks, an image containing 30 object instances would create a [30, 10000, 14000] mask tensor, hence the resizing
I’m not sure if your model is able to detect the objects using a different input resolution or if it needs more training on these. Generally I would expect to see differences in CNNs as the kernels were trained to extract features from the inputs used during training while these features might be different now. In a very simplified way think about the first kernels as edge detectors, which could fire a strong signal for a low resolution image (assuming these conv layers were trained on it) while the “edge” might expand to a larger pixel space in a high-res image and thus these edge detectors would see an almost constant input.
Ah ok, that’s probably what’s going on. I thought Mask R-CNN was insensitive to scale differences since it internally rescales the inputs anyway. I’ll try to find a sweet spot between image size and RAM consumption to see if I can get this to work, thanks!
edit: would tiling the images so they fit in memory help in this case?
However, IMNO (in my noob opinion) a more informative error could help in this case. Maybe check if the masks are empty before passing them to paste_masks_in_image and raise earlier so the user knows nothing was detected? Not sure how global this solution would be though, just my two cents. The way it is now looks like there’s a problem with my input.
That’s a good point and I might be wrong if these images are indeed resized internally.
Yes, I also think the error might be improved. However, I’m unsure if a value check is strictly needed as it could add a synchronization which you generally want to avoid.
CC @pmeier for visibility
Providing a better error message is usually a good thing. Given that there might be some perf regressions, we need to be careful where we would put such a check. @martimpassos Can you provide an MRE that either doesn’t rely on your data, e.g. replace image with torch.rand() or the like and so on, or provide your data, so we can reproduce?