Hi,
I’m trying my hand at sematic segmentation using torchvision and the built-in tools it provides. What I don’t understand is How I’m supposed to use the targets I get from the VOCSegmentation 2012 dataset. Using the snippet below
import torch
import torchvision.transforms as tf
import torchvision.datasets as dsets
train_set = dsets.VOCSegmentation(
root='./data'
,year='2012'
,download=True
,image_set= 'train'
,transform = tf.Compose([tf.RandomCrop(256), tf.ToTensor()])
,target_transform = tf.Compose([tf.RandomCrop(256), tf.ToTensor()])
)
train_loader = torch.utils.data.DataLoader(
train_set, batch_size=4, shuffle=False, num_workers=1
)
train_iter = iter(train_loader)
image, target = next(train_iter)
print(target[1][0])
I get the following output:
tensor([[0.0784, 0.0784, 0.0784, ..., 0.0000, 0.0000, 0.0000],
[0.0784, 0.0784, 0.0784, ..., 0.0000, 0.0000, 0.0000],
[0.0784, 0.0784, 0.0784, ..., 0.0000, 0.0000, 0.0000],
...,
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]])
I would have expected the output from the target to be integers between 0 and 20 (As there are 20 classes in the data set). Why is there floats there? Is this normalized in some way I don’t understand or have I misunderstood how the target are supposed to look?