VGG isn't training on CIFAR

I am trying to train VGG1 on CIFAR-10 dataset but for some reason, it’s not training. I have tried to make sure of all the major issues like how not to add softmax activation after the linear layer. Even cross-checked the outputs. This is how I have defined the network:

cifar_class = torchvision.models.vgg11(pretrained=False)
cifar_class.classifier.add_module('7', nn.Linear(in_features=1000, out_features=10,bias=True))
cifar_class.features.add_module('0', nn.Conv2d(in_channels=1, out_channels=64, kernel_size=3, stride=1, padding=1, bias=True))
cifar_class = cifar_class.to(device)

I want to use grayscale images and I am loading the dataset like this:

training_data = datasets.CIFAR10(root="data", train=True, download=True,
                                  transform=transforms.Compose([
                                      transforms.ToTensor(),
                                      transforms.Grayscale(num_output_channels=1)
                                  ]))

validation_data = datasets.CIFAR10(root="data", train=False, download=True,
                                  transform=transforms.Compose([
                                    transforms.ToTensor(),
                                    transforms.Grayscale(num_output_channels=1)
                                  ]))

And finally, the training process:

for epoch in range(10):  # loop over the dataset multiple times

    running_loss = 0.0
    running_corrects = 0
    for i,data in enumerate(tqdm(training_loader)):
        # get the inputs; data is a list of [inputs, labels
        inputs, labels = data[0].cuda(), data[1].cuda()

        # zero the parameter gradients
        optimizer.zero_grad()

        outputs = cifar_class(inputs)
        _, preds = torch.max(outputs, 1)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item() * inputs.size(0)
        running_corrects += torch.sum(preds == labels.data)
    print('[%d, %5d] loss: %.3f acc: %.3f' %
              (epoch + 1, i + 1, running_loss / 50000, running_corrects.double() / 50000))
    running_loss = 0.0
    running_corrects = 0

print('Finished Training')

What am I doing wrong?

Hello Flock!

Could you include your training loader, optimizer, and loss function definitions, for replication?

Thanks!
Andrei

Here’s the code:

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(cifar_class.parameters(), lr=0.001)
exp_lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)

Here is the loader:

training_loader = DataLoader(training_data, 
                             batch_size=16, 
                             shuffle=True,
                             pin_memory=True)

validation_loader = DataLoader(validation_data,
                               batch_size=16,
                               shuffle=True,
                               pin_memory=True)

Seems like you forgot to Resize your input to what VGG normally expects (224 x 224). There are some other small issues, e.g. you didn’t Normalize your input, but from quickly playing around with it, the lack of Resize seems to be the main culprit.

Here I am (over)fitting on just the first few batches, with shuffle=False, and you can clearly see the difference in loss convergence.

image

Btw, the jumpy loss (for “with resize”) is expected since there are some Dropout layers in there (which normally get turned off for validation) so it’ll never settle on exactly zero.

This is how you resize:

transform=transforms.Compose([
    transforms.Resize(224),  # this!
    transforms.ToTensor(),
    # transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225)),  you may want to add this too
    transforms.Grayscale(num_output_channels=1),
]))

Hope this helps!

1 Like

Hmmm. I don’t know how will resizing help, considering the size of CIFAR images is only 32X32. The normalizing part I’ll try. I used this as a reference and I don’t see re-sizing but I do see normalizing.

What do you think will happen if 32X32 is resized to 224X224?

I think the kernels of the convolutions are sized according to some assumption about what area of the total input they cover. When the total input is much smaller, the kernels may be too large to capture the kind of features they ought to. You don’t add any information by upsampling from 32 to 224, but it’s possible that the “resolution” of the convolutions is inappropriate unless the input is 224. I find it pretty convincing that toggling that one knob was enough to make the whole thing work, in the example I showed you.

(note another difference between your training and the one from your reference, they use SGD with weight decay and momentum, you use Adam without weight decay. in my experiments that also helped a little)

1 Like

That’s interesting. I’ll try this.

I also thought of this. I’ll change it and see if it helps. Thanks though.

I want to apologize for this thread. I don’t know what happened to me that I forgot some basic deep learning troubleshooting. Changing to SGD worked