# How to interpret targets in VOCSegmentation 2012

Hi,

I’m trying my hand at sematic segmentation using torchvision and the built-in tools it provides. What I don’t understand is How I’m supposed to use the targets I get from the VOCSegmentation 2012 dataset. Using the snippet below

import torch
import torchvision.transforms as tf
import torchvision.datasets as dsets

train_set = dsets.VOCSegmentation(
root='./data'
,year='2012'
,image_set= 'train'
,transform = tf.Compose([tf.RandomCrop(256), tf.ToTensor()])
,target_transform = tf.Compose([tf.RandomCrop(256), tf.ToTensor()])
)

train_set, batch_size=4, shuffle=False, num_workers=1
)

image, target = next(train_iter)

print(target[1][0])

I get the following output:

tensor([[0.0784, 0.0784, 0.0784, ..., 0.0000, 0.0000, 0.0000],
[0.0784, 0.0784, 0.0784, ..., 0.0000, 0.0000, 0.0000],
[0.0784, 0.0784, 0.0784, ..., 0.0000, 0.0000, 0.0000],
...,
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000],
[0.0000, 0.0000, 0.0000, ..., 0.0000, 0.0000, 0.0000]])

I would have expected the output from the target to be integers between 0 and 20 (As there are 20 classes in the data set). Why is there floats there? Is this normalized in some way I don’t understand or have I misunderstood how the target are supposed to look?