Hi, I am trying to clarify a doubt about the shape of the input tensor. I’m aware that PyTorch requires the shape to be [batch_size, num_channels, H, W]
class auto_encode1(nn.Module):
def __init__(self, encoding_size = 3):
super(auto_encode1, self).__init__()
self.encoding_size = encoding_size
self.input_size = 369
self.encoder = nn.Sequential(
nn.Linear(self.input_size, 250), nn.ReLU(),
nn.Linear(250, 125), nn.ReLU(),
nn.Linear(125, 60), nn.ReLU(),
nn.Linear(60, self.encoding_size), nn.ReLU()
)
self.decoder = nn.Sequential(
nn.Linear(self.encoding_size, 60), nn.ReLU(),
nn.Linear(60, 125), nn.ReLU(),
nn.Linear(125, 250), nn.ReLU(),
nn.Linear(250, self.input_size), nn.Tanh(),
)
def encode(self, x):
# print(x.size())
x = x.reshape(x.size(0), -1)
return self.encoder(x)
def decode(self, x):
return self.decoder(x)
def forward(self, x):
print(x.shape)
x1 = self.encode(x)
xd = self.decode(x1)
return xd
In my case, the input is [num_pixels, 369]
, num_pixels
is variable here. When I use a batch size of 1 (because if I use more than one it throws an error since the input shape is different for images), this is how the input to the model looks like:
torch.Size([1, 16, 369])
torch.Size([1, 16, 369])
torch.Size([1, 25, 369])
torch.Size([1, 4, 369])
torch.Size([1, 36, 369])
As you can see the num_pixels
is variable, I cannot reshape them for data integrity in this particular case.
Because of this I have to use a collate function:
def collate_fn(input):
image = torch.cat(input,0)
return image
which makes the input to the model look like:
with batch size 1
torch.Size([16, 369])
torch.Size([16, 369])
torch.Size([25, 369])
torch.Size([4, 369])
torch.Size([36, 369])
with batch size 3
torch.Size([57, 369])
torch.Size([40, 369])
My question is:
Does it make a difference if the input the model is not in shape [batch_size, num_channels, H, W]
, because in my case it is [batch_size*num_channels*H, W], where num_channels = 1 and H = num_pixels
?