Simple 2 class MLP

Dear Community

I would like to build a simple MLP that assigns class A or B to a given input.

The input data are 128d feature representations extracted from FaceNet.
Input data (X): shape=(23445, 128), dtype=float64

The target data are the binary class labels 0 or 1 (denotes class A or B)
Target data (y): shape=(23445, 1), array([0, 0, 1, …, 1, 1, 1]), dtype=int64

# Network architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(128, 100)
        self.fc2 = nn.Linear(100, 2)

    def forward(self, x):
        x = x.view(-1, 128)
        x = F.relu(self.fc1(x))
        x = F.softmax(self.fc2(x))
        return x

net = Net()
# Create 2d array from target data
targets = np.empty((len(y), 2))
for i in range(0, len(y)):
    if(y[i] == 0):
        targets[i, 0] = 1
        targets[i, 1] = 0
    else:
        targets[i, 0] = 0
        targets[i, 1] = 1

Result

array([[1., 0.],
       [1., 0.],
       [0., 1.],
       ...,
       [0., 1.],
       [0., 1.],
       [0., 1.]])
# Convert inputs, targets to tensor
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
inputs, targets = torch.from_numpy(X), torch.from_numpy(targets)
inputs, targets = inputs.type(torch.FloatTensor),targets.type(torch.FloatTensor)
inputs = inputs.to(device)
targets = targets.to(device)
net.to(device)

net.train()
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001)
running_loss = 0.0
# zero the parameter gradients
optimizer.zero_grad()

# forward
# inputs [23445x128]
# outputs [23445x2]
outputs = net(inputs)
# Result Forward Pass
tensor([[ 0.4894,  0.5106],
        [ 0.4900,  0.5100],
        [ 0.4897,  0.5103],
        ...,
        [ 0.4813,  0.5187],
        [ 0.4825,  0.5175],
        [ 0.4889,  0.5111]], device='cuda:0')
# outputs [23445x2]
# targets [23445x2]
# batch size 23445
loss = criterion(outputs, targets)
print(loss)
loss.backward()
optimizer.step()

Result Loss

tensor(0.7241, device='cuda:0')

I hope you have some advice how to approach this problem.

Thank you very much

I’m not sure, what the current issue is, but your current setup has some minor bugs.

Since you are using two output neurons to represent the logits of both classes, you should use nn.LogSoftmax + nn.NLLLoss or raw logits + nn.CrossentropyLoss for your loss function.

Also you don’t need to convert your target to a one-hot representation. Just leave y as the class indices.

1 Like

Thank you very much for the fast reply.

Lets say I leave y as the class indicies like this
Target data (y): shape=(23445, 1), array([0, 0, 1, …, 1, 1, 1]), dtype=int64

And I change the network architecture to have only 1 output neuron

# Network architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(128, 100)
        self.fc2 = nn.Linear(100, 1)

    def forward(self, x):
        x = x.view(-1, 128)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

net = Net()

Do you have a recommendation how to make this work?

Or should I stay with 2 output neurons and choose CELoss as it is a multi-class problem?

You have different options for a binary classification use case:

  • you could use 1 output neuron and [nn.Sigmoid + nn.BCELoss]
  • 1 output neuron and [raw logits (no non-linearity for the last layer) + nn.BCEWithLogitsLoss]
  • 2 output neurons and [nn.LogSoftmax + nn.NLLLoss]
  • 2 output neurons and [raw logits + nn.CrossEntropyLoss]

So basically you can decide, if you want to use one neuron or two for a binary classification task.

3 Likes

Thank you very much !
I will try these options

1 Like

Option 4 ( 2 output neurons and [raw logits + nn.CrossEntropyLoss ] ) works!

Although I need several thousand epochs on the whole batch [23445, 128]
Without normalization.
SGD, lr=0.1, momentum=0.9

I will try different hyper-parameters.

Result

# Loss
tensor(0.1288, device='cuda:0')
# Output
tensor([[ 3.0039e+00, -3.0803e+00],
        [ 1.6767e+00, -1.9099e+00],
        [-1.0335e+00,  9.7911e-01],
        ...,
        [-9.0753e-01,  7.0044e-01],
        [-4.2910e-01,  2.0413e-01],
        [-9.9083e-01,  7.8327e-01]], device='cuda:0')
# Target
tensor([ 0,  0,  1,  ...,  1,  1,  1], device='cuda:0')

Could you try with small chunks of your batch? Since with the whole batch you only get a single parameter update per epoch and with smaller chunks you would get more updates.

1 Like

Thanks for the hint, you are right! I will try this and also split train and test data.

Yes, by using batches the loss already significantly decreases after the first 3 epochs.

Train Samples = 18756
Train Batch Size = 16
Number Train Batches = 1172

Test Samples = 4689

Epoch 29
Test Loss 0.1350 (on all 18756 samples)
Validation Loss 0.3397 (on all 4689 samples)

I will now start playing with the model hyper-parameters.

Thanks!