Unused model parameters affect optimization for Adam

Thanks for the code update!
The difference in your implementation is due to the order of instantiation of the layers. Since you are creating the unused conv layers before the linear layer, the PRNG will be have additional calls and the weights of the linear layers will differ.
If you change the __init__ method of MyModuleUnused to

        super(MyModelUnused, self).__init__()
        self.conv_list = nn.ModuleList()
        self.conv_list.append(nn.Conv2d(3, 6, 3, 1, 1))
        self.conv_list.append(nn.Conv2d(6, 12, 3, 1, 1))
        self.pool1 = nn.MaxPool2d(2)
        self.pool2 = nn.MaxPool2d(2)
        self.fc = nn.Linear(12*6*6, 2)
        self.conv_list.append(nn.Conv2d(12, 24, 3, 1, 1))
        self.conv_list.append(nn.Conv2d(24, 12, 3, 1, 1))

, you’ll get the same results again.
Alternatively, you could set the seed before instantiating each layer.

2 Likes