Input tensor shape to a model with only linear layers

Hi, I am trying to clarify a doubt about the shape of the input tensor. I’m aware that PyTorch requires the shape to be [batch_size, num_channels, H, W]

class auto_encode1(nn.Module):
    def __init__(self, encoding_size = 3):
        super(auto_encode1, self).__init__()
        self.encoding_size = encoding_size
        self.input_size = 369
        self.encoder = nn.Sequential(
            nn.Linear(self.input_size, 250), nn.ReLU(),
            nn.Linear(250, 125), nn.ReLU(),
            nn.Linear(125, 60), nn.ReLU(),
            nn.Linear(60, self.encoding_size), nn.ReLU()
        self.decoder = nn.Sequential(
            nn.Linear(self.encoding_size, 60), nn.ReLU(),
            nn.Linear(60, 125), nn.ReLU(),
            nn.Linear(125, 250), nn.ReLU(),
            nn.Linear(250, self.input_size), nn.Tanh(),

    def encode(self, x):
        # print(x.size())
        x = x.reshape(x.size(0), -1)
        return self.encoder(x)
    def decode(self, x):
        return self.decoder(x)
    def forward(self, x):
        x1 = self.encode(x)
        xd = self.decode(x1)
        return xd

In my case, the input is [num_pixels, 369], num_pixels is variable here. When I use a batch size of 1 (because if I use more than one it throws an error since the input shape is different for images), this is how the input to the model looks like:

torch.Size([1, 16, 369])
torch.Size([1, 16, 369])
torch.Size([1, 25, 369])
torch.Size([1, 4, 369])
torch.Size([1, 36, 369])

As you can see the num_pixels is variable, I cannot reshape them for data integrity in this particular case.

Because of this I have to use a collate function:

def collate_fn(input):
    image =,0)
    return image

which makes the input to the model look like:

with batch size 1
torch.Size([16, 369])
torch.Size([16, 369])
torch.Size([25, 369])
torch.Size([4, 369])
torch.Size([36, 369])

with batch size 3
torch.Size([57, 369])
torch.Size([40, 369])

My question is:
Does it make a difference if the input the model is not in shape [batch_size, num_channels, H, W], because in my case it is [batch_size*num_channels*H, W], where num_channels = 1 and H = num_pixels?

Both approaches should yield the same result for linear layers as seen here:

# setup
batch_size, num_channels, H, W = 3, 4, 5, 6
lin = nn.Linear(W, 10)

# multi-dim approach
x = torch.randn([batch_size, num_channels, H, W])
out = lin(x)

# 2D approach
y = x.view(-1, W)
out_y = lin(y)

# comparison
out_y = out_y.view_as(out)
print((out - out_y).abs().max())
# > tensor(0., grad_fn=<MaxBackward1>)

as additional dimensions are used as separate samples in nn.Linear.

1 Like

Thanks for the snippet!