Given groups=1, weight of size [48, 32, 3, 3], expected input[1, 14, 128, 94] to have 32 channels, but got 14 channels instead

Hey guys, I’ facing an issue regarding my dataloader itself.

For some reason, one of the last entries in the dataset is having a channel size of 14 channels instead of the 32 channels. Could it be due to some issue on my DataLoader?

Here is the Dataset class:

class LogmelDataset(Dataset):
    def init(self, h5_file, feature_type, transform=None):
        self.h5_file = h5py.File(“bird_features.h5”, “r”)
        self.feature_imgs = self.h5_file[feature_type]
        self.transform = transform

    def __len__(self):
        return len(self.feature_imgs)

    def __getitem__(self, idx):
        if self.transform:
            img = self.transform(self.feature_imgs[idx])
            return img
        return self.feature_imgs[idx]

Here is the model I’m using:

class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()

        self.encoder = nn.Sequential(
            nn.Conv2d(32, 48, 3, padding=(1, 2)),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(48, 16, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)
        )

        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(16, 48, 2, stride=2),
            nn.ReLU(),
            nn.ConvTranspose2d(48, 32, 2, stride=2),
            nn.ReLU(),
            nn.Conv2d(32, 32, 2, stride=1, padding=(1, 0)),
            nn.ReLU(),
            nn.Conv2d(32, 32, 2, stride=1, padding=(0, 0)),
            nn.Sigmoid()
        )

    def forward(self, X):
        encoded = self.encoder(X)
        decoded = self.decoder(encoded)
        return decoded

I’m trying to use an autoencoder to reconstruct bird-singing spectrograms. Currently using this dataset: bird song data set | Kaggle

And the data loader is being used as follow:

logmel_dataset = LogmelDataset(“bird_features.h5”,
“logmel”)

loader = torch.utils.data.DataLoader(
dataset=logmel_dataset,
batch_size=32,
shuffle=True
)

For some reason the training works perfectly but suddenly fails when processing the last batch. All the batches have the shape [32, 128, 94], but the last batch is having a shape of [14, 128, 94].

Hi João!

A Conv2d accepts an input tensor of either four or three dimensions. In the
four-dimensional case the first dimension is the batch dimension and the second
dimension is the channels dimension.

In the three-dimensional case there is no batch dimension and the first dimension
is the channels dimension.

Your initial Conv2d (32, 48, 3, padding = (1, 2)) expects 32 input channels.
Your “batches” are single samples with no batch dimension (or an implicit batch
dimension of one, if you prefer), so the leading dimension is the channels dimension.
So most of your “batches” are single samples with 32 channels, matching your Conv2d,
but your last “batch” has 14 channels, so you get the error.

It would appear that your dataset – presumably bird_features.h5 – contains
samples with no channels dimension (or maybe a single explicit channel).

Your DataLoader explicitly creates batches with batch_size = 32 (with the last
batch having whatever is left over, in your case 14 samples).

If your data samples really have only a single channel, create your initial Conv2d
with a single input channel as Conv2d (1, 48, 3, padding = (1, 2)) and make
sure that your input tensors have an explicit channels dimension of size 1 (using
something like unsqueeze(), as necessary), with, e.g., a shape of [32, 1, 128, 94].

Best.

K. Frank

Building on @KFrank’s diagnosis — the reason it appeared to “work” until the last batch is a subtle but important gotcha:

Your DataLoader is stacking 2D items of shape [128, 94] into 3D batches of shape [B, 128, 94]. Conv2d happily accepts a 3D input and interprets it as a single unbatched sample of shape [C, H, W]. When B = 32, that 3D tensor happened to match your Conv2d(32, 48, ...)'s expected channel count - pure coincidence between your batch size and your first layer’s in_channels. So Conv2d was treating your entire batch as one sample with 32 channels.

Your model wasn’t training on batches at all. It was processing one merged “sample” per step where the 32 examples were stacked along the channel axis. The loss was computed, gradients flowed, and the loop ran without errors — but it wasn’t doing what you intended. The last batch (14 samples) finally raised the channel mismatch because 14 is not equal 32, exposing the underlying shape bug.

A fix may be to add a channel dimension in __getitem__ so the DataLoader produces proper 4D batches [B, 1, H, W].

Hey thanks @KFrank and @Aditya_Mehra really appreciate! This actually solved up my problem.

I just forgot to add the channel dimension to this, thank you so much for you answers!