How to interpret targets in VOCSegmentation 2012

Hi,

I’m trying my hand at sematic segmentation using torchvision and the built-in tools it provides. What I don’t understand is How I’m supposed to use the targets I get from the VOCSegmentation 2012 dataset. Using the snippet below

import torch
import torchvision.transforms as tf
import torchvision.datasets as dsets

train_set = dsets.VOCSegmentation(
    root='./data'
    ,year='2012'
    ,download=True
    ,image_set= 'train'
    ,transform = tf.Compose([tf.RandomCrop(256), tf.ToTensor()])
    ,target_transform = tf.Compose([tf.RandomCrop(256), tf.ToTensor()])
)

train_loader = torch.utils.data.DataLoader(
    train_set, batch_size=4, shuffle=False, num_workers=1
)

train_iter = iter(train_loader)
image, target = next(train_iter)

print(target[1][0])

I get the following output:


tensor([[0.0784, 0.0784, 0.0784, ..., 0.0000, 0.0000, 0.0000],
    [0.0784, 0.0784, 0.0784, ..., 0.0000, 0.0000, 0.0000],
    [0.0784, 0.0784, 0.0784, ..., 0.0000, 0.0000, 0.0000],
    ...,
    [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
    [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
    [0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]])

I would have expected the output from the target to be integers between 0 and 20 (As there are 20 classes in the data set). Why is there floats there? Is this normalized in some way I don’t understand or have I misunderstood how the target are supposed to look?