Shape Error while implementing U-Net (Encoder Part) in Pytorch

I am trying to learn build a U-NET architecture from scratch. I have written this code but the problem is that when I try to run to check the output of the encoder part, I am having issues with it. When you the run the code below , you’ll get

import torch
import torch.nn as nn

batch = 1
channels = 3
width = 512 # same as height

image = torch.randn(batch, channels, width, width)

enc = Encoder(channels)
enc(image)

RuntimeError: Given groups=1, weight of size [128, 64, 3, 3], expected input[1, 3, 512, 512] to have 64 channels, but got 3 channels instead

Below is the code:

class ConvolutionBlock(nn.Module):
    '''
    The basic Convolution Block Which Will have Convolution -> RelU -> Convolution -> RelU
    '''
    def __init__(self, in_channels, out_channels, upsample:bool = False,):
        '''
        args:
            upsample: If True, then use TransposedConv2D (Means it being used in the decoder part) instead MaxPooling 
            batch_norm was introduced after UNET so they did not know if it existed. Might be useful
        '''
        super().__init__()
        self.network = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size = 3, padding= 1), # padding is 0 by default, 1 means the input width, height == out width, height
            nn.ReLU(),
            nn.Conv2d(out_channels, out_channels, kernel_size = 3, padding = 1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 2, stride = 2)  if not upsample else nn.ConvTranspose2d(out_channels, out_channels//2, kernel_size = 2, )  # As it is said in the paper that it TransPose2D halves the features 
        )

    def forward(self, feature_map_x):
        '''
        feature_map_x could be the image itself or the
        '''
        return self.network(feature_map_x)

class Encoder(nn.Module):
    '''
    '''
    def __init__(self, image_channels:int = 1, repeat:int = 4):
        '''
        In UNET, the features start at 64 and keeps getting twice the size of the previous one till it reached BottleNeck
        '''
        super().__init__()
        in_channels = [image_channels,64, 128, 256, 512]
        out_channels = [64, 128, 256, 512, 1024]

        self.layers = nn.ModuleList(
            [ConvolutionBlock(in_channels = in_channels[i], out_channels = out_channels[i]) for i in range(repeat+1)]
        )
    
    def forward(self, feature_map_x):
        for layer in self.layers:
            out = layer(feature_map_x)
        return out

EDIT: Running the code below gives me expected info too:


in_ = [3,64, 128, 256, 512]
ou_ = [64, 128, 256, 512, 1024]
width = 512

from torchsummary import summary   
 
for i in range(5):   
    cb = ConvolutionBlock(in_[i], ou_[i])
    summary(cb, (in_[i],width,width))
    
    print('#'*50)

Your forward definition looks wrong, as you are trying to pass the input to each layer:

    def forward(self, feature_map_x):
        for layer in self.layers:
            out = layer(feature_map_x)
        return out

This should be working:

    def forward(self, feature_map_x):
        out = feature_map_x
        for layer in self.layers:
            out = layer(out)
        return out