I’ve searched on google and suggested threads here for an answer to this but couldn’t find one.
So, let’s say I have these 4 images as input:
########################
### Images / Dataset ###
Image1 = torch.rand((3, 255, 255))
Image2 = torch.rand((3, 320, 320))
Image3 = torch.rand((3, 320, 320))
Image4 = torch.rand((3, 120, 120))
I tried to create a simple dataset and data loader for this:
def VariedSizedImagesCollate(batch):
return [item for item in batch]
class Images_X_Dataset(Dataset):
def __init__(self, ListOfImages):
self.data = ListOfImages
def __getitem__(self, index):
return self.data[index]
def __len__(self):
return len(self.data)
MyDataset = Images_X_Dataset([Image1, Image2, Image3, Image4])
MyDataLoader = torch.utils.data.DataLoader(dataset = MyDataset, batch_size = 4, shuffle = True, collate_fn = VariedSizedImagesCollate, pin_memory = True)
We can take one batch and put it in the variable MyX
MyX = next(iter(MyDataLoader))
Now we have a simple fully convolutional network (so that the network itself can handle different size pictures without any tricks).
#############################################
### Model to work with Varied Size Images ###
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.Conv1 = nn.Conv2d(in_channels = 3, out_channels = 32, kernel_size = (7, 7))
self.Conv2 = nn.Conv2d(in_channels = 32, out_channels = 64, kernel_size = (5, 5))
self.Decv1 = nn.ConvTranspose2d(in_channels = 64, out_channels = 1, kernel_size = (4, 4))
self.Sigm1 = nn.Sigmoid()
def forward(self, X):
out = self.Conv1(X)
out = self.Conv2(out)
out = self.Decv1(out)
out = self.Sigm1(out)
return out
model = Net()
Now if I instantiate the model and pass the 1st image of MyX, I will get an output as I should.
MyOutImg = model(MyX[0].unsqueeze(0))
print("Original shape of 1st image:", MyX[0].shape, "Output's 1st image shape: ", MyOutImg[0].shape)
Original shape of 1st image: torch.Size([3, 120, 120]) Output’s 1st image shape: torch.Size([1, 113, 113])
The problem arises when I want to pass the whole batch:
MyOutImg = model(MyX)
print("Original shape of 1st image:", MyX[0].shape, "Output's 1st image shape: ", MyOutImg[0].shape)
TypeError: conv2d(): argument ‘input’ (position 1) must be Tensor, not list
Which I understand. It couldn’t be a tensor (as far as I understand) because of the fact that pictures have different sizes and tensors require same sizes for all pictures to have N pictures in a tensor (NxCxHxW), so VariedSizedImagesCollate() does return a list.
I tried torch.cat
and torch.stack
but they both seem to require same size images. So what’s a way where I can pass the whole batch?
I also need to it work on the backpropagation as well, but I guess if it works in the forward it should work on back too.