# Loss computation in binary classification

Fairly newbie to Pytorch & neural nets world, so bear with me.
Below is a code snippet from a binary classification being done using a simple 3 layer network :

``````n_input_dim = X_train.shape[1]
n_hidden = 100  # Number of hidden nodes
n_output = 1   # Number of output nodes = for binary classifier
# Build the network
model = nn.Sequential(
nn.Linear(n_input_dim, n_hidden),
nn.ELU(),
nn.Linear(n_hidden, n_output),
nn.Sigmoid())

x_tensor =  torch.from_numpy(X_train.values).float()
tensor([[ -1.0000,  -1.0000,  -1.0000,  ..., -99.0000, -99.0000, -99.0000],
[ -1.0000,  -1.0000,  -1.0000,  ...,   0.1538,   5.0000,   0.1538],
[ -1.0000,  -1.0000,  -1.0000,  ..., -99.0000,   6.0000,   0.2381],
...,
[ -1.0000,  -1.0000,  -1.0000,  ..., -99.0000, -99.0000, -99.0000],
[ -1.0000,  -1.0000,  -1.0000,  ..., -99.0000, -99.0000, -99.0000],
[ -1.0000,  -1.0000,  -1.0000,  ..., -99.0000, -99.0000, -99.0000]])
y_tensor =  torch.from_numpy(Y_train).float()
tensor([0., 0., 1.,  ..., 0., 0., 0.])
#Loss Computation
loss_func = nn.BCELoss()
#Optimizer
learning_rate = 0.0001

train_loss = []
iters = 500
for i in range(iters):
y_pred = model(x_tensor)
loss = loss_func(y_pred, y_tensor)
print " Loss in iteration :"
print (i, loss.item())

loss.backward()
optimizer.step()
train_loss.append(loss.item())
``````

In the above case , what i’m not sure about is loss is being computed on y_pred which is a set of probabilities ,computed from the model on the training data with y_tensor (which is binary 0/1). Is this way of loss computation fine in Classification problem in pytorch? Shouldn’t loss be computed between two probabilities set ideally ? If this is fine , then does loss function , BCELoss over here , scales the input in some manner ?

Also , x tensor is ranging for all sort of values.Do i need to scale it before feeding into the the model network ?

The loss function seems to be correctly evaluated, you can get some hints on how it is done in the documentation. It would work perfectly fine even if the `target` probabilities were not one-hot encoded.

I’m not sure what you meant by the BCELoss scales the input, I guess the right answer is no.

Yes, it is advisable to normalize your data before training, it provides numerical stability to the optimization step.

Thanks for the inputs.Also , after the 500 iteration can y_pred be taken as resultant probability output ? And utlized for generating confusion matrix from scikit ? Something like below code snippet , is fine an approach for metrics computation ?

``````
output_pred = y_pred.detach().numpy()
print "output_pred"
print output_pred
pred_arr = []
for i in range(len(output_pred)):
if (round(output_pred[i],4) > 0.5000):
pred_arr.append(1)
else :
pred_arr.append(0)

conf_matrix = confusion_matrix(pred_arr, Y_train)
``````

Seems alright to me!

It could be useful to you to navigate through the official tutorials to get a better grasp of PyTorch’s features. Some other useful resources could be PyTorch Examples and FastAI MOOC.