Is thIs the problem with the model or something is wrong in my training process or the way I am giving input to the model?

I have used fcn8s model with vgg backbone for image segmentation task on the pascal VOC 2012 dataset but after training the model I am unable to get the desired results of segmentation. I have used this link

for making the fcn model with the dataset that is available on torchvision.datasets


These are some of the screenshots of my project. I couldn’t figure out where the problem is since the model has minimal loss but the segmentation is not working as I have wished.

What is not working at the moment, i.e. are the outputs not as expected?
Did the model work fine during training and validation (were the outputs reasonable)?

1 Like

With the use of unet and fcn vgg model, i am unable to get desired accuracy on pascal voc 2012 segmentation dataset, with 10 epoch i am only getting acc. starting from 27% to 45%. I have used the dataset available on PyTorch datasets as

train_set = torchvision.datasets.VOCSegmentation(root='drive/My Drive/VOC',
                                     year='2012',
                                     image_set='train',
                                     download=True,
                                     transform=T.Compose([
                                                          T.Resize(144),
                                                          T.CenterCrop(128),
                                                          T.ToTensor()
                                     ]),
                                     target_transform=T.Compose([
                                                           T.Resize(144),
                                                           T.CenterCrop(128),
                                                           T.ToTensor()
                                     ]),
                                     transforms=None)

and batch of 64 as:

train_loader = data.DataLoader(train_set, batch_size=64, shuffle=True)

and my train.py is:

train_Accuracy = []
train_Loss = []
dictionary = {}
for epoch in range(8):
  total_loss = 0
  total_correct = 0
  total_train = 0
  correct_train = 0
  ts = time.time()

  for batch in train_loader:
    images, labels = batch

    optimizer.zero_grad()

    preds = model(images)
    loss = criterion(preds, labels)

    loss.backward()
    optimizer.step()

    total_loss += loss.item()
    # total_correct += get_num_correct(preds, labels)

    _, predicted = torch.max(preds.data, 1)
    total_train += labels.nelement()
    correct_train += predicted.eq(labels.data).sum().item()
    train_accuracy = 100 * correct_train / total_train

  train_Loss.append(round(loss.item(), 5))
  train_Accuracy.append(round(train_accuracy, 4))

  dictionary = {
      "epoch": epoch + 1,
      "loss": round(loss.item(), 5),
      "accuracy": round(train_accuracy, 4),
      "training time": time.time() - ts,
      "model": model.state_dict()
  }
  torch.save(dictionary, 'data_to_be_saved_{}.pth'.format(epoch+1))
  print('epoch: ', epoch + 1,
        # ' total_correct: ', total_correct,
        # ' loss: ', total_loss/len(data_loader),
        ' loss: ', round(loss.item(), 5),
        ' accuracy: ', round(train_accuracy, 4),
        ' training time: ', time.time() - ts)

I would recommend to try to overfit a small data sample and make sure your model can successfully predict these samples.
If that’s not the case, there might be a bug in the code I’m missing or the architecture is not suitable for the task.

1 Like

Thanks, Sir, the problem was on my dataset and I fixed it, with this the accuracy is quite good but using that model and segmenting a single image is just not working. trans_img_tensor = T.Compose([ T.Resize(256), T.CenterCrop(224), T.ToTensor() ]) image = Image.open('rose.jpeg') a = trans_img_tensor(image) c = a.unsqueeze(0) label = model(c) trans_tensor_2_img = T.Compose([ T.ToPILImage() ]) d = trans_tensor_2_img(label.argmax(1).float()) d

Your code is a bit hard to read, but did you call model.eval() before trying to test the model?

1 Like

Sir I am trying to load a saved model and to use it on the image. I have used

state_dict()

to save the model and using

model = TheModelClass(*args, **kwargs)
model.load_state_dict(dictionary['model'])

to load the model but the *args and **kwargs are not defined, what to use on them, I have read your previous answer on this but didn’t quite able to find what to use on them.
The model used above is a CNN and I am trying to load an unet.

class TheModelClass(nn.Module):
    def __init__(self):
        super(TheModelClass, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Initialize model
model = TheModelClass()

The args and kwargs are just placeholders in case your model’s __init__ takes some arguments.
Since yours only uses self, you don’t need to pass anything to the initialization.

1 Like

I was successful in loading the model with your guidance but using this

def image_segmentation(path):
  image = Image.open(path)
  trans_image = trans(image)
  model.eval()
  dim_added_image = trans_image.unsqueeze(0)
  segment = model(dim_added_image)
  segmented_image_tensor = segment.argmax(1).float()
  segmented_image = trans_2_img(segmented_image_tensor)
  return segmented_image

as well I am not able to segment the image that I feed to it.