Note that (some) torchvision segmentation models will use a dict as the output. Could you check that?
dict