My model seems not to be training

Patrice · August 14, 2020, 7:16am

Hello dear engineers,
Hope you are all doing well. I am still in my first steps in both pytorch and programming. My task is a binary segmentation problem which I would like to approach it as a binary classification. My data are RGB of size (360, 360, 3) and the labels are black and white. I have adopted the following simple CNN model for the training process.

class CNNModel(nn.Module):
    def __init__(self):
        super(CNNModel, self).__init__()

        self.cnn1 = nn.Conv2d(in_channels=3, out_channels=16, kernel_size=5, stride=1, padding=0)
        self.relu1 = nn.ReLU()
        self.maxpool1 = nn.MaxPool2d(kernel_size=2)

        self.cnn2 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=0)
        self.relu2 = nn.ReLU()
        self.maxpool2 = nn.MaxPool2d(kernel_size=2)

        self.dropout = nn.Dropout(p=0.5)

        #self.fc1 = nn.Linear(32 * 4* 4, 2)
        self.fc1 = nn.Linear(32 * 87 * 87, 2)

    def forward(self, x):
        out = self.cnn1(x)
        out = self.relu1(out)
        out = self.maxpool1(out)
        out = self.cnn2(out)
        out = self.relu2(out)
        out = self.maxpool2(out)
        out = out.view(out.size(0), -1)
        out = self.dropout(out)
        out = self.fc1(out)

        return out

My training function is as follows.


opt = Config()
device = torch.device('cuda') if opt.use_gpu else torch.device('cpu')
num_classes = 2

train_dataset = Dataset(opt.train_input, opt.train_label)
print(len(train_dataset))
# print(train_dataset.__getitem__(0)[0].shape)
train_loader = data.DataLoader(train_dataset,
                                batch_size=opt.train_batch_size,
                                shuffle=True,
                                num_workers=opt.num_workers)

valid_dataset = Dataset(opt.valid_input, opt.valid_label)

print(len(valid_dataset))
valid_loader = data.DataLoader(valid_dataset,
                                batch_size=opt.train_batch_size,
                                shuffle=True,
                                num_workers=opt.num_workers)


CE = nn.CrossEntropyLoss().to(device)


def train(train_loader, model, optimizer, criterion=CE):

    model.train()

    for i, (input, target) in enumerate(tqdm(train_loader)):
        # if idx.size(0) != opt.train_batch_size:
        #     break

        input = torch.autograd.Variable(input.to(device))
        target = torch.autograd.Variable(target.to(device))
        # input = input.to(device)
        # target = target.to(device)
        target = target.long()
        output = model(input)
        #print(target.shape, output.shape)
        #print(torch.max(target, 1))
        loss = criterion(output, target[:,2,0,0])

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()


def test(model, test_loader):
    model.eval()
    correct = 0
    total = 0

    for i, (input, target) in enumerate(tqdm(test_loader)):
        input = torch.Tensor(input).to(device)
        target = torch.autograd.Variable(target).to(device)

        total += target.size(0)
        output = model(input)
        print(target.shape, output.shape)
        _, predicted = output.max(1)
        correct += predicted.eq(target[:,2,0,0]).sum().item()

    accuracy = 100. * correct / total

    return accuracy


def main_ce():
    model_ce = CNNModel().to(device)
    best_valid_acc = 0

    for epoch in range(20):
        print("epoch=", epoch)
        #print("r=", r)
        learning_rate = 1e-4
        optimizer_ce = torch.optim.Adam(model_ce.parameters(), lr=learning_rate)
        print("traning model_ce...")
        train(train_loader=train_loader, model=model_ce, optimizer=optimizer_ce)
        print("validating model_ce...")
        valid_acc = test(model=model_ce, test_loader=valid_loader)
        print('valid_acc=', valid_acc)
        if valid_acc >= best_valid_acc:
            best_valid_acc = valid_acc
            torch.save(model_ce, './model_ce_' + str(1) + '_' + str(1) + '_' + str(1))
            print("saved.")
        print("testing model_ce...")
        test_acc = test(model=model_ce, test_loader=valid_loader)
        print('test_acc=', test_acc)



if __name__ == '__main__':
    main_ce()

The code can run without bugs. However, the accuracy remains unchanged after training for 20 epochs. I feel like there is something wrong with the codes. Please, could anyone help me check it out? Any comments or suggestions would be highly appreciated.

ptrblck · August 17, 2020, 9:16am

I would recommend to try to overfit a small data sample (e.g. just 10 samples) and make sure your model is able to do so.
If that doesn’t work out of the box, you could play around with some hyperparameters.
Once this is done, you could (carefully) scale up the problem again.

Patrice · August 20, 2020, 1:43am

Dear Mr. Ptrblck,
Thank you very much for the suggestion. I am sorry for the late reply. I have been trying other alternatives read from this forum. In fact, my data are RGB images from which I generated the corresponding labels. After checking carefully, I noticed that the labels are also having three channels. Then I have tried to fix them as follows.

label = np.where(label_img > 0.0, 1, 0)

When I print the labels out, the shape is (N,H,W). Please, am I right doing it this way? How can I map the colors to obtain grayscale labels?

ptrblck · August 21, 2020, 8:44am

Since you are using two output channels in your model and nn.CrossEntropyLoss, you would have to apply a mapping between the colors of your target maps and the class indices.
If your current targets contain (normalized) RGB values, label would be a tensor full of ones.

Patrice · August 22, 2020, 1:28am

Thank you for your suggestions. Unfortunately, I am facing some hardware problems presently. I will give it a try and give you feedback after fixing my computer.