Binary classification model not training

atl · April 2, 2020, 5:30pm

Hi Pytorch community,

I’m new to Pytorch and relatively new to neural networks.
What I want to build is a network simulating a human learning task, where a stimulus of 2 dimensions with different SNRs maps onto a binary response. I have thus created my binary target vector (y) and an input vector (x) with the mean shifted positive/negative depending on the target response.

The network doesn’t seem to learn - the accuracy stays at 50% and the loss also only decreases marginally. I have played with the parameters (learning rate, weight initialisation etc), but nothing changed. I’ve now been stuck at this point for days and couldn’t find any help in the discussion forum so far, so I’d really appreciate any advice on what I’m doing wrong!

Thanks a lot.

Here’s my code:

y = torch.empty(10000,1, dtype=torch.float).random_(2)

batchlen = torch.Tensor.nelement(y)

x = torch.empty(10000,1, dtype=torch.float)

for t in range(batchlen):
if y[t] == 0:
x[t] = torch.randn(1)*(-1)
elif y[t] == 1:
x[t] = torch.randn(1)

sig_m = 1.3
sig_c = .1

x_m = xsig_m
x_c = xsig_c

x = torch.cat((x_m,x_c),1)
x.requires_grad=True

class Network(nn.Module):
def init(self):
super().init()

    self.hidden = nn.Linear(2,100)
    self.output = nn.Linear(100,1)
    
    self.relu = nn.ReLU()
    self.softmax = nn.Sigmoid()
    
def weights_init(self):
    for module in self.modules():
        if isinstance(module, nn.Linear):
            nn.init.normal_(module.weight, mean = 0, std = 0.1)

def forward(self, x):
    x = self.hidden(x)
    x = self.relu(x)
    x = self.output(x)
    x = self.softmax(x)
    
    return(x)

model = Network()
model.weights_init()

criterion = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.0001)

def binary_acc(pred, target):
pred_tag = torch.round(torch.sigmoid(pred))

correct_results_sum = (pred_tag == target).sum().float()
acc = correct_results_sum/target.shape[0]
acc = torch.round(acc * 100)

return acc

loss_all = np.zeros((batchlen))
acc_all = np.zeros((batchlen))

for e in range(batchlen):
running_loss = 0
running_acc = 0

out = model(x)

loss = criterion(out,y)
acc = binary_acc(out,y)

optimizer.zero_grad()

loss.backward()
optimizer.step()

running_loss += loss.item()
running_acc += acc.item()

acc_all[e] = acc.item()
loss_all[e] = loss.item()

else:
print(f"Training loss: {running_loss}")
print(f"Training accuracy: {running_acc}")

KFrank · April 2, 2020, 7:46pm

Hi atl!

The x you generate for y = 0 isn’t really different than that for y = 1.

torch.randn() draws samples from the normal distribution, and
this distribution is symmetrical about zero. Therefore randn() and
randn() * (-1) are statistically the same.

That is, your inputs for y = 0 and y = 1 aren’t actually any different,
so your network can’t learn.

As a second issue, you aren’t calculating your accuracy correctly.

def binary_acc(pred, target):
    pred_tag = torch.round(torch.sigmoid(pred))

    correct_results_sum = (pred_tag == target).sum().float()
    acc = correct_results_sum/target.shape[0]
    acc = torch.round(acc * 100)
    
    return acc

Here, you are applying sigmoid() twice in your accuracy calculation,
once in the last “layer” of your model:

(Calling your last layer softmax when it’s really nn.Sigmoid() is a
confusing naming choice.)

and then again in your binary_acc() function.

So the first sigmoid() (in model) takes the output of
self.output = nn.Linear(100,1) that ranges from -inf to inf
and maps it to [0.0, 1.0]. The second sigmoid() (in binary_acc())
maps this to [0.5, 1.0]. This is then always rounded to 1, so you
compare 100% 1’s in pred_tag to 50% 1’s in target and return
50% for your accuracy, acc.

(As an aside, best practices would recommend, for reasons of
numerical stability, that you get rid of the final Sigmoid in model
and use BCEWithLogitsLoss(rather thanBCELoss`) as your
loss function.)

Good luck.

K. Frank

atl · April 3, 2020, 9:57am

Dear K.Frank,

thank you so much for this extensive reply and all the explanations. That was massive help.

I have now fixed the accuracy calculation and changed my loss function to BCEWithLogitsLoss and got rid of the sigmoid in my network.

I’m still a bit confused though about how I should initialise my inputs, so that the normal distributions are not the same for y=0 and y=1.

This is what I did now, but the accuracy drops from 50% to 3% during the course of training:

x_m = torch.empty(10000,1, dtype=torch.float)
x_c = torch.empty(10000,1, dtype=torch.float)

for t in range(batchlen):
if y[t] == 0:
x_m[t] = torch.normal(-1, 1.3, size=(1,1))
x_c[t] = torch.normal(-1, 0.1, size=(1,1))
elif y[t] == 1:
x_m[t] = torch.normal(1, 1.3, size=(1,1))
x_c[t] = torch.normal(1, 0.1, size=(1,1))

Thank you so much! Really appreciate your help.

KFrank · April 3, 2020, 2:26pm

Hi atl!

It sounds like you are calculating your accuracy backwards.

If you’re getting an accuracy of 3% on a binary problem, then you
could just as well be making predictions with 97% accuracy by
swapping your “0” predictions for “1” and vice versa.

Is your loss decreasing sensibly as you train?

The way you construct x should work, but it would be a lot cleaner
(and more efficient) if you would get rid of the for and if statements.

You can do it all with pytorch tensor operations, which, in general, is
to be preferred.

I think I would do something like:

x_m = (1 - y) * torch.normal (-1.0, 1.3, size = (10000, 1)) + y * torch.normal (1.0, 1.3, size = (10000, 1))
x_c = (1 - y) * torch.normal (-1.0, 0.1, size = (10000, 1)) + y * torch.normal (1.0, 0.1, size = (10000, 1))
x = torch.cat ((x_m, x_c), 1)

This has the minor disadvantage that you generate twice as many
random samples as you need, throwing half away, but it looks simpler
and easier to read to me than the fancier (and more efficient)
alternatives.

As an aside, notice that for the second of your two input values
(the x_c piece), the means of your two distributions (that is, for
y = 0 and y = 1) differ by about fifteen standard deviations,
having in practice no overlap. Therefore it should be very easy
to construct a perfect classifier. After you get this working, it might
be fun to make the standard deviation of your x_c distributions
something like 1.0 so that the y = 0 and y = 1 distributions do,
in fact, overlap, and see how large an accuracy you can achieve
in practice with your network vs. the maximum theoretical accuracy.

Best.

K. Frank

atl · April 4, 2020, 11:27am

Hi K.Frank!

Thank you so much again for your help.

I implemented your version of creating the input nodes without the loops, works like a charm. Thank you again also for the long explanation - this is super helpful for learning!

Regarding the accuracy: yes, that is also what I thought. And the loss decreases from 0.7 to 0.001 during the course of training, so I figured there must be a mistake somewhere in my accuracy calculation.

define accuracy function

def binary_acc(pred, target):
pred_tag = torch.round(pred)

correct_results_sum = (pred_tag == target).sum().float()
acc = correct_results_sum/target.shape[0]
acc = acc*100

return acc

This is the function I use, I compare just generally where the (rounded) output corresponds to the target. I tried it on two other toy tensors and it worked sensibly.
Can you spot my mistake?

Thank you so much!

Best,
atl

KFrank · April 4, 2020, 3:04pm

Hello atl!

atl:

I figured there must be a mistake somewhere in my accuracy calculation.

# define accuracy function
def binary_acc(pred, target):
    pred_tag = torch.round(pred)

    correct_results_sum = (pred_tag == target).sum().float()
    acc = correct_results_sum/target.shape[0]
    acc = acc*100
    
    return acc

Based on what you have said, you have taken the sigmoid()
out of your network. (Correct, because you changed your loss
function to BCEWithLogitsLoss().) But you have also taken
the sigmoid() out of your accuracy calculation.

Without the network sigmoid() the tensor pred will contain
values that can range from -inf to inf. (We generally call
these kinds of predictions logits.) So pred_tag will be integers
that also range, in principle, from -inf to inf, and will rarely
match your target of 0 or 1.

You could put the sigmoid() back in your accuracy calculation.
Or, probably a little bit nicer, compare

(pred > 0.0).long() == target

to test for correct predictions. (And make sure you understand
what this is doing, and why it works.)

Good luck.

K. Frank

atl · April 8, 2020, 12:30pm

Hi K.Frank,

thank you so much for this hint. Of course it couldn’t work without the sigmoid() in my accuracy function. Added it back in and everything works perfectly now.

Thank you su much for your comments & feedback! That was great help and saved me days of debugging