I have a semantic segmentation task in hand. I have input images in size: (2056, 2464, 3). The network I am using is “fcn_resnet101”. The input for this model should be 224*224 so I resize my images:
data_transforms = {
‘train’: transforms.Compose([
#transforms.RandomResizedCrop(input_size),
transforms.Resize((input_size, input_size), Image.NEAREST),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
‘val’: transforms.Compose([
transforms.Resize((input_size, input_size), Image.NEAREST),
transforms.CenterCrop(input_size),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
class Train_Dataset(Dataset):
def __init__(self, imgarray, labelarray, transform=None):
self.images = imgarray
self.labels = labelarray
self.transform = transform
def __getitem__(self, index):
img = self.images[index]
if self.transform is not None:
img = self.transform(img)
Img = Image.fromarray(self.labels[index]) #numpy array to PIL
PilImg = Img.resize((224, 224), Image.NEAREST ) #resize the PIL image
label = torch.from_numpy(np.asarray(PilImg)) # convert PIL to numpy and then convert the numpyarray to tensor
#print(img.size(), label.size())
return img, label
def __len__(self):
return len(self.images)
I am getting strange result when training: training accuracy surpasses 100% in the second epoch and validation accuracy stops at 21 percent.
I want to solve this: I assumed that maybe resizing the image is the reason I’m getting this result. Should I split the input images into smaller sized ones (224*224) and then sum the results at the end? or just resizing is the correct approach? I read somewhere that we split a big input image when we can’t read it into memory but we pytorch we have the dataloader so does it mean that it is not necessary to use it? or we need it anyway as resizing means loosing valuable information?