FCN_resnet101 implementation help

satrya-sabeni · February 28, 2020, 1:13pm

Hi,

I’m trying to to train the fcn_resnet101 model with my own data to do image semantic segmentation. I’m trying do this implement this by trying to use the fine tuning tutorial, structure. Right now I am able to train, but I’m pretty sure I’m not using the right loss functions, and optimizers and probably a lot more missing functions. I am just beginning with Pytorch fyi.

My images come in batch shape: (N, 3, 256, 256)
My mask come in batch shape: (N, 256, 256) # each pixel is a class 0,1 or 2

I changed the last classifier layer of the to my 3 amount of classes (bg, window frame, glass), like this ( I hope this is the correct way):

num_classes = 3  # (background, kozine, glass)
model.classifier[4] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))

These are following model params I have put in the model:

# model params
num_epochs = 3
criterion = nn.modules.loss.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005)
dataloader = DataLoader(trainset, batch_size=5)

The model training code itself.

start_time = time.time()
    
best_model = copy.deepcopy(model.state_dict())
best_acc = 0.0
val_acc_hist = []

for epoch in range(num_epochs):
    print('Epoch: {}/ {}'.format(epoch + 1, num_epochs))

    epoch_loss = 0    
    running_loss = 0.0
    running_corrects = 0

    # Iterate batches over dataloader
    for k, (inputs, labels) in enumerate(dataloader):
        print(f'processing batch {k}...')
        
        # move to gpu 
        _inputs = inputs.cuda()
        _labels = labels.cuda()

        # zero the gradients again
        optimizer.zero_grad()

        # forward
        outputs = model(_inputs) # one batch forward 
        loss1 = criterion(outputs['out'], _labels.long()) # get score of output
        loss2 = criterion(outputs['aux'], _labels.long())
        loss = loss1 + 0.4*loss2
        loss.backward()
        optimizer.step() # move weights

        # save preds and take highest prediction
        _, preds = torch.max(outputs['out'], 1)

        running_loss += loss.item() * inputs.size(0) # length of the batch
        running_corrects += torch.sum(preds == _labels.data) # amount of correct pixels in batch

    # epoch statistics
    epoch_loss = running_loss / len(dataloader.dataset) # avg loss of whole dataset
    epoch_acc = (running_corrects.double() / len(dataloader.dataset)) / 262144 # avg correct pixels

    print('Loss: {:.4f} Acc: {:.4f}'.format(epoch_loss, epoch_acc))

    # evaluation phase
    if epoch_acc > best_acc:
        best_acc = epoch_acc
        best_model_wts = copy.deepcopy(model.state_dict())

    val_acc_hist.append(epoch_acc)


# after finishing training
time_elapsed = time.time() - start_time
print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
print('Best val acc: {:4f}'.format(best_acc))

test_model = model.load_state_dict(best_model_wts)

The loss, and accuracy is just stays the same for most part, I definitely am doing it wrong.

epoch 1 - Loss: 0.0056 Acc: 0.0823
epoch 2 - Loss: 0.0048 Acc: 0.0823
epoch 3 - Loss: 0.0042 Acc: 0.0823

Can anybody point where to go from here to fix it? Or is there an official implementation where I can look it like the mask rcnn tutorial?

Any help is appreciated!

ptrblck · February 29, 2020, 6:04am

The code looks fine.
You should initialize the criterion as nn.CrossEntropyLoss, but that’s a minor issue.
Also, don’t use the .data attribute, as it’s deprecated and might yield unwanted side effects.

As the loss goes down, the model is in fact training.
I would recommend to try out different optimizers (e.g. Adam) and also play around with some hyperparameters (e.g. learning rate).
If that doesn’t help, try to overfit a small data sample and make sure your model is able to overfit these samples ~perfectly.

satrya-sabeni · March 3, 2020, 9:14am

Thanks I have implemented your tips and improved the code. The actual error was from the way the way I loaded in my masks. Initially they were just pixels from [0, 1, 2], but I used torchvision.transforms.functional.to_tensor which converts it between 0-1 so it became [0., 0.00392157, 0.00784314] instead. Using torch.from_numpy() worked for me which converted it without changing the numbers. And nn.CrossEntropyLoss takes unnormalised label scores. So now it’s working pretty decent