hi,
`torchvision.transforms.` `ToTensor`
Convert a PIL Image
or numpy.ndarray
to tensor.
Converts a PIL Image or numpy.ndarray (H x W x C) in the range [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0]
why need use ToTensor for images of segmentation task inputs? Is this related to the loss function?
thanks!
Normalized input (to [0, 1]
or z-score) is usually beneficial for your training, i.e. your model might train faster and better.
However, in a segmentation use case you should be careful using ToTensor
on your mask image, since this might destroy the labels.
I need build two transform.Compose?
img_transform = transform.Compose([
transform.RandomRotation(2),
transform.ToTensor(),
transform.Normalize([.485, .456, .406], [.229, .224, .225])])
label_transform = transform.Compose([
transform.RandomRotation(2),
])
How can I guarantee the same angle of rotation?
For a segmentation use case, you should use the functional API of torchvision.transforms
as shown here.
Transform to tensor
image = TF.to_tensor(image)
mask = TF.to_tensor(mask)
transform.ToTensor()
Have the same effect?
TF.to_tensor(mask) this might destroy the labels ?
thanks!
Both methods have the same effect.
Yes, depending on the format of your target image.
E.g. if your target image is a color image where each different color encodes a class, you should rather create a mapping between colors and classes. ToTensor
will normalize the image such that the values will be in [0, 1]
which won’t be suitable to use in e.g. nn.CrossEntropyLoss
anymore, as it expects class indices in the range [0, nb_classes]
.