torchvision.datasets.CocoDetection returns tensors for images a list of tensors for the segmentations in each image. I’m struggling to understand how to work with this for semantic segmentation training.
I think I want to convert this list of segmentations into binary masks, but I’m having trouble figuring out how.
for images, labels in dataloaders_dict[phase]:
if images.size()[0] == 1:
continue
images = images.to(device)
labels = torch.squeeze(labels)
labels = labels.to(device)
if (phase == 'train') and (count == 0):
optimizer.step()
optimizer.zero_grad()
with torch.set_grad_enabled(phase == 'train'):
outputs = net(images)
loss = criterion(outputs, labels.long())
But this is not working as the loss is way too low and when I do inference on test pictures, it classify all pixels as black, which I guess is non-object or background. I have no idea what to do. Any suggestion?
|| Loss: 0.2654 || 10iter: 45.0074 sec.
|| Loss: 0.0304 || 10iter: 34.7093 sec.
|| Loss: 0.0018 || 10iter: 53.4208 sec.
|| Loss: 0.0001 || 10iter: 34.5700 sec.
for category in anns:
seg_rle = category['segmentation']
tmp = decode(frPyObjects(seg_rle, raw_img.size[1], raw_img.size[0]))
if tmp.ndim == 3:
tmp = np.sum(tmp, axis=2, dtype=np.uint8)
category['segmentation'] = tmp
for category in anns:
pilImg = Image.fromarray(category['segmentation'])
anns_img = pilImg.resize((raw_img.size[1], raw_img.size[0]), resample=Image.NEAREST)
I tried this variation for my dataloader but with no luck,