Loss computation in binary classification

Fairly newbie to Pytorch & neural nets world, so bear with me.
Below is a code snippet from a binary classification being done using a simple 3 layer network :

n_input_dim = X_train.shape[1]
n_hidden = 100  # Number of hidden nodes
n_output = 1   # Number of output nodes = for binary classifier
# Build the network
model = nn.Sequential(
    nn.Linear(n_input_dim, n_hidden),
    nn.ELU(),
    nn.Linear(n_hidden, n_output),
    nn.Sigmoid())

x_tensor =  torch.from_numpy(X_train.values).float()
tensor([[ -1.0000,  -1.0000,  -1.0000,  ..., -99.0000, -99.0000, -99.0000],
       [ -1.0000,  -1.0000,  -1.0000,  ...,   0.1538,   5.0000,   0.1538],
       [ -1.0000,  -1.0000,  -1.0000,  ..., -99.0000,   6.0000,   0.2381],
       ...,
       [ -1.0000,  -1.0000,  -1.0000,  ..., -99.0000, -99.0000, -99.0000],
       [ -1.0000,  -1.0000,  -1.0000,  ..., -99.0000, -99.0000, -99.0000],
       [ -1.0000,  -1.0000,  -1.0000,  ..., -99.0000, -99.0000, -99.0000]])
y_tensor =  torch.from_numpy(Y_train).float()
tensor([0., 0., 1.,  ..., 0., 0., 0.])
#Loss Computation
loss_func = nn.BCELoss()
#Optimizer
learning_rate = 0.0001
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

train_loss = []
iters = 500
for i in range(iters):
    y_pred = model(x_tensor)
    loss = loss_func(y_pred, y_tensor)
    print " Loss in iteration :"
    print (i, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    train_loss.append(loss.item())

In the above case , what i’m not sure about is loss is being computed on y_pred which is a set of probabilities ,computed from the model on the training data with y_tensor (which is binary 0/1). Is this way of loss computation fine in Classification problem in pytorch? Shouldn’t loss be computed between two probabilities set ideally ? If this is fine , then does loss function , BCELoss over here , scales the input in some manner ?

Also , x tensor is ranging for all sort of values.Do i need to scale it before feeding into the the model network ?

The loss function seems to be correctly evaluated, you can get some hints on how it is done in the documentation. It would work perfectly fine even if the target probabilities were not one-hot encoded.

I’m not sure what you meant by the BCELoss scales the input, I guess the right answer is no.

Yes, it is advisable to normalize your data before training, it provides numerical stability to the optimization step.

Thanks for the inputs.Also , after the 500 iteration can y_pred be taken as resultant probability output ? And utlized for generating confusion matrix from scikit ? Something like below code snippet , is fine an approach for metrics computation ?


output_pred = y_pred.detach().numpy()
print "output_pred"
print output_pred
pred_arr = []
for i in range(len(output_pred)):
    if (round(output_pred[i],4) > 0.5000):
        pred_arr.append(1)
    else :
        pred_arr.append(0)

conf_matrix = confusion_matrix(pred_arr, Y_train)

Seems alright to me!

It could be useful to you to navigate through the official tutorials to get a better grasp of PyTorch’s features. Some other useful resources could be PyTorch Examples and FastAI MOOC.