Problem using nn.Sequential

While trying to train a model for the CIFAR10 datset I encountered a problem using nn.Sequential.

When I train the following model implementation everything goes as expected. The loss drops and the accuracy increases:

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        print("After conv_block_1:", x.shape)

        x = self.pool(F.relu(self.conv2(x)))
        print("After conv_block_2:", x.shape)

        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

However when i try to implement the same model using nn.Sequential, the loss stays the same after every epoch:

class NetV3(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv_block = nn.Sequential(
            nn.Conv2d(3, 6, 5),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(6, 16, 5),
            nn.ReLU(),
            nn.MaxPool2d(2, 2)
        )
        
        self.classifier = nn.Sequential(
            nn.Linear(16 * 5 * 5, 120),
            nn.ReLU(),
            nn.Linear(120, 84),
            nn.ReLU(),
            nn.Linear(84, 10)
        )

    def forward(self, x):
        x = self.conv_block(x)
        print("After conv_block:", x.shape) 
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

the training loop i used is the following:

for epoch in range(3):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print(f'[{epoch + 1}, {i + 1:5d}] loss: {running_loss / 2000:.3f}')
            running_loss = 0.0

print('Finished Training')

Hi Ektoras!

I don’t see any difference between your Net and NetV3 models. Unless I’m missing
something, they should be the same.

First, start by printing out loss (which hides less information than running_loss) and
running_loss after every batch (not just starting with the 2000th batch).

Also try using something like torch.manual_seed (20001999) to make sure that your
two versions of Net have their layers initialized identically. Also, if your trainloader
uses any randomness (which it normally would), make sure that your two runs use the
same trainloader random sequence.

It could simply be that your two training runs don’t match one another (because of the
initialization and trainloader randomness) and that the NetV3 run happens to plateau,
while the Net training run doesn’t.

If you initialize your two models identically and pass them the same data, their outputs
should also be identical (or possibly differ by floating-point round-off error). Verify that
this is the case with a single forward pass. Then check that your first backward pass and
optimization step also yield the same results, and so on.

Best.

K. Frank

2 Likes