Hi,
I’m trying to to train the fcn_resnet101 model with my own data to do image semantic segmentation. I’m trying do this implement this by trying to use the fine tuning tutorial, structure. Right now I am able to train, but I’m pretty sure I’m not using the right loss functions, and optimizers and probably a lot more missing functions. I am just beginning with Pytorch fyi.
My images come in batch shape: (N, 3, 256, 256)
My mask come in batch shape: (N, 256, 256) # each pixel is a class 0,1 or 2
I changed the last classifier layer of the to my 3 amount of classes (bg, window frame, glass), like this ( I hope this is the correct way):
num_classes = 3 # (background, kozine, glass)
model.classifier[4] = nn.Conv2d(512, num_classes, kernel_size=(1,1), stride=(1,1))
These are following model params I have put in the model:
# model params
num_epochs = 3
criterion = nn.modules.loss.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005)
dataloader = DataLoader(trainset, batch_size=5)
The model training code itself.
start_time = time.time()
best_model = copy.deepcopy(model.state_dict())
best_acc = 0.0
val_acc_hist = []
for epoch in range(num_epochs):
print('Epoch: {}/ {}'.format(epoch + 1, num_epochs))
epoch_loss = 0
running_loss = 0.0
running_corrects = 0
# Iterate batches over dataloader
for k, (inputs, labels) in enumerate(dataloader):
print(f'processing batch {k}...')
# move to gpu
_inputs = inputs.cuda()
_labels = labels.cuda()
# zero the gradients again
optimizer.zero_grad()
# forward
outputs = model(_inputs) # one batch forward
loss1 = criterion(outputs['out'], _labels.long()) # get score of output
loss2 = criterion(outputs['aux'], _labels.long())
loss = loss1 + 0.4*loss2
loss.backward()
optimizer.step() # move weights
# save preds and take highest prediction
_, preds = torch.max(outputs['out'], 1)
running_loss += loss.item() * inputs.size(0) # length of the batch
running_corrects += torch.sum(preds == _labels.data) # amount of correct pixels in batch
# epoch statistics
epoch_loss = running_loss / len(dataloader.dataset) # avg loss of whole dataset
epoch_acc = (running_corrects.double() / len(dataloader.dataset)) / 262144 # avg correct pixels
print('Loss: {:.4f} Acc: {:.4f}'.format(epoch_loss, epoch_acc))
# evaluation phase
if epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
val_acc_hist.append(epoch_acc)
# after finishing training
time_elapsed = time.time() - start_time
print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
print('Best val acc: {:4f}'.format(best_acc))
test_model = model.load_state_dict(best_model_wts)
The loss, and accuracy is just stays the same for most part, I definitely am doing it wrong.
epoch 1 - Loss: 0.0056 Acc: 0.0823
epoch 2 - Loss: 0.0048 Acc: 0.0823
epoch 3 - Loss: 0.0042 Acc: 0.0823
Can anybody point where to go from here to fix it? Or is there an official implementation where I can look it like the mask rcnn tutorial?
Any help is appreciated!