How to build a Convolution Neural Network to detect face pose point map in Pytorch?

I am following the following official Pytorch tutorial to prepare a face pose point map dataset, but it doesn’t include the code to implement the predictive model.

I build a CNN with Adam optimizer and MultiLabelSoftMarginLoss because it needs to predict an array of points representing the face point map for each input image. It is not working.

The code for the dataset preparation is the following:

My detailed code and comments are the following:

The code for the model and training is the following:


class ConvNet(nn.Module):
    def __init__(self, num_classes=10):
        super(ConvNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 18, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0))
       self.layer2 = nn.Sequential(
            nn.Conv2d(18, 32, kernel_size=3, stride=1, padding=1),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=0))
       self.fc = nn.Linear(32 * 56 * 56, num_classes)
   def forward(self, x):
       out = self.layer1(x)
       out = self.layer2(out)
       out = out.reshape(out.size(0), -1)
       out = self.fc(out)
       return out


model = ConvNet(num_classes).to(device)

# Loss and optimizer
criterion = nn.MultiLabelSoftMarginLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, sample_batched in enumerate(train_loader):
        images_batch, landmarks_batch = \
            sample_batched['image'], sample_batched['landmarks']

        images = images_batch
        labels = landmarks_batch.reshape(-1, 68 * 2)

        images = Variable(images.float())
        labels = Variable(labels)

        images =
        labels =

        # Forward pass
        outputs = model(images)

        loss = criterion(outputs, labels.float())

        # Backward and optimize

        if (i+1) % 5 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

Expected results:

  • Prediction of the face point map


  • The model does not find an optimal local minimum during training.

for detailed code and comments, see the following Jupiter notebook.

1 Like