Simple 2 class MLP

tao · October 15, 2018, 1:21pm

Dear Community

I would like to build a simple MLP that assigns class A or B to a given input.

The input data are 128d feature representations extracted from FaceNet.
Input data (X): shape=(23445, 128), dtype=float64

The target data are the binary class labels 0 or 1 (denotes class A or B)
Target data (y): shape=(23445, 1), array([0, 0, 1, …, 1, 1, 1]), dtype=int64

# Network architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(128, 100)
        self.fc2 = nn.Linear(100, 2)

    def forward(self, x):
        x = x.view(-1, 128)
        x = F.relu(self.fc1(x))
        x = F.softmax(self.fc2(x))
        return x

net = Net()

# Create 2d array from target data
targets = np.empty((len(y), 2))
for i in range(0, len(y)):
    if(y[i] == 0):
        targets[i, 0] = 1
        targets[i, 1] = 0
    else:
        targets[i, 0] = 0
        targets[i, 1] = 1

Result

array([[1., 0.],
       [1., 0.],
       [0., 1.],
       ...,
       [0., 1.],
       [0., 1.],
       [0., 1.]])

# Convert inputs, targets to tensor
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
inputs, targets = torch.from_numpy(X), torch.from_numpy(targets)
inputs, targets = inputs.type(torch.FloatTensor),targets.type(torch.FloatTensor)
inputs = inputs.to(device)
targets = targets.to(device)
net.to(device)

net.train()
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001)
running_loss = 0.0
# zero the parameter gradients
optimizer.zero_grad()

# forward
# inputs [23445x128]
# outputs [23445x2]
outputs = net(inputs)

# Result Forward Pass
tensor([[ 0.4894,  0.5106],
        [ 0.4900,  0.5100],
        [ 0.4897,  0.5103],
        ...,
        [ 0.4813,  0.5187],
        [ 0.4825,  0.5175],
        [ 0.4889,  0.5111]], device='cuda:0')

# outputs [23445x2]
# targets [23445x2]
# batch size 23445
loss = criterion(outputs, targets)
print(loss)
loss.backward()
optimizer.step()

Result Loss

tensor(0.7241, device='cuda:0')

I hope you have some advice how to approach this problem.

Thank you very much

ptrblck · October 15, 2018, 1:27pm

I’m not sure, what the current issue is, but your current setup has some minor bugs.

Since you are using two output neurons to represent the logits of both classes, you should use nn.LogSoftmax + nn.NLLLoss or raw logits + nn.CrossentropyLoss for your loss function.

Also you don’t need to convert your target to a one-hot representation. Just leave y as the class indices.

tao · October 15, 2018, 1:36pm

Thank you very much for the fast reply.

Lets say I leave y as the class indicies like this
Target data (y): shape=(23445, 1), array([0, 0, 1, …, 1, 1, 1]), dtype=int64

And I change the network architecture to have only 1 output neuron

# Network architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(128, 100)
        self.fc2 = nn.Linear(100, 1)

    def forward(self, x):
        x = x.view(-1, 128)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

net = Net()

Do you have a recommendation how to make this work?

Or should I stay with 2 output neurons and choose CELoss as it is a multi-class problem?

ptrblck · October 15, 2018, 1:45pm

You have different options for a binary classification use case:

you could use 1 output neuron and [nn.Sigmoid + nn.BCELoss]
1 output neuron and [raw logits (no non-linearity for the last layer) + nn.BCEWithLogitsLoss]
2 output neurons and [nn.LogSoftmax + nn.NLLLoss]
2 output neurons and [raw logits + nn.CrossEntropyLoss]

So basically you can decide, if you want to use one neuron or two for a binary classification task.

tao · October 15, 2018, 1:47pm

Thank you very much !
I will try these options

tao · October 15, 2018, 3:01pm

Option 4 ( 2 output neurons and [raw logits + nn.CrossEntropyLoss ] ) works!

Although I need several thousand epochs on the whole batch [23445, 128]
Without normalization.
SGD, lr=0.1, momentum=0.9

I will try different hyper-parameters.

Result

# Loss
tensor(0.1288, device='cuda:0')
# Output
tensor([[ 3.0039e+00, -3.0803e+00],
        [ 1.6767e+00, -1.9099e+00],
        [-1.0335e+00,  9.7911e-01],
        ...,
        [-9.0753e-01,  7.0044e-01],
        [-4.2910e-01,  2.0413e-01],
        [-9.9083e-01,  7.8327e-01]], device='cuda:0')
# Target
tensor([ 0,  0,  1,  ...,  1,  1,  1], device='cuda:0')

justusschock · October 15, 2018, 3:19pm

Could you try with small chunks of your batch? Since with the whole batch you only get a single parameter update per epoch and with smaller chunks you would get more updates.

tao · October 15, 2018, 3:22pm

Thanks for the hint, you are right! I will try this and also split train and test data.

tao · October 15, 2018, 5:05pm

Yes, by using batches the loss already significantly decreases after the first 3 epochs.

Train Samples = 18756
Train Batch Size = 16
Number Train Batches = 1172

Test Samples = 4689

Epoch 29
Test Loss 0.1350 (on all 18756 samples)
Validation Loss 0.3397 (on all 4689 samples)

I will now start playing with the model hyper-parameters.

Thanks!