Modules defined in __init__() but not used in forward() still affect the results

I have defined a channel attention block in the init, but didn’t use it in the forward function. However, the network performance still improved a lot. I’m confused. Any suggestions? Thanks!

class ImgFeatMSNet2(nn.Module):
    '''
    image and feature space multi-scale deconvolution network: multiscale image do Wiener separately and merge in refine net
    '''

    def __init__(self, n_colors, channel_atten=True):
        super(ImgFeatMSNet2, self).__init__()
        # params
        self.channel_atten = channel_atten
        # self.n_colors = n_colors
        n_resblock = 3
        n_feats1 = 16
        n_feats = 32
        kernel_size = 5
        padding = 2

        self.FeatModule1 = FeatModule(
            n_in=n_colors, n_feats=n_feats1, kernel_size=kernel_size, padding=padding, act=True)

        self.InBlock1 = InBlock(n_in=n_feats1 + n_colors, n_feats=n_feats,
                                kernel_size=kernel_size, padding=padding, act=True)

        if channel_atten:
            self.ChannelAttention1 = ChannelAttention(
                in_planes=n_feats1+n_colors, ratio=2)
            self.ChannelAttention2 = ChannelAttention(
                in_planes=n_feats1 + n_feats+n_colors, ratio=4)

        self.RefineUnet = RefineUnet(
            n_feats=n_feats, n_resblock=n_resblock, kernel_size=kernel_size, padding=padding, act=True)

        self.OutBlock_a = OutBlock_a(
            n_feats=n_feats, n_resblock=n_resblock, kernel_size=kernel_size, padding=padding)
        self.OutBlock_b = OutBlock_b(
            n_feats=n_feats, n_out=n_colors, kernel_size=kernel_size, padding=padding)

    def forward(self, input):
        # scale1: x0.5; scale1: x1.0
        n, c, h, w = input.shape

        feat_2 = self.FeatModule1(input)  # get image feature
        wd_1 = torch.cat((feat_2, input), 1)  # cat image and feature


        # The following codes are not used
        # if self.channel_atten:
        #     wd_1 = self.ChannelAttention1(wd_1)
        
        in_1 = self.InBlock1(wd_1)
        refine_1 = self.RefineUnet(in_1, int(round(h/2)), int(round(w/2)))
        out_a1 = self.OutBlock_a(refine_1)
        out_b1 = self.OutBlock_b(out_a1)

        return out_b1

Creating an additional layer would initialize all parameters (assuming this layer contains parameters) by sampling random numbers through the pseudorandom number generator (PRNG). This would affect all future calls of the PRNG and would not yield the same results even if the additional layer is not used in the forward. Changing the seed would have a similar effect and if your model is sensitive to the seed it would mean that the overall training is not particularly stable.

Thanks for your answer! I will try to use a different initialization method and check the result.