Binary classification Different input sizes error

Tornike · April 27, 2021, 1:10pm

Hello, doing binary classification with BCELoss()

this is my dataset gettiem()

def getitem(self,index):
x=torch.tensor(self.df.iloc[index,:-1])
y_label=torch.tensor(self.df.iloc[index,-1])
return (x.float(),y_label.float())

this is my Model


class Credit_Module(nn.Module):
    def __init__(self):
        super().__init__()
#if i have 20 columns should i give first layer 20 input features?
        self.fc1=nn.Linear(20,50)
        self.fc2=nn.Linear(50,80)
        self.fc3=nn.Linear(80,20)
        self.fc4=nn.Linear(20,2)
        
    
    def forward(self,x):
        x=F.relu(self.fc1(x))
        x=F.relu(self.fc2(x))
        x=F.relu(self.fc3(x))
        x=self.fc4(x)
        print(x.shape)
        return x

This is The Error



                    y_pred=model(X_train)
     11             print(y_train.shape)
---> 12             loss=criterion(y_pred,y_train)
ValueError: Using a target size (torch.Size([10])) that is different to the input size (torch.Size([10, 2])) is deprecated. Please ensure they have the same size.

it gets different sizes. how can i make them same size?
Should i change a Loss function ? or something in my model?
Train loader batch size is 10

RaLo4 · April 27, 2021, 1:58pm

I don’t really know what you are trying to do here using BCELoss, so I will guess according to the info you have given.
Since you said you want to do binary classification and your target is of size torch.Size([10]), I’m guessing it is filled with ones and zeros and you want your network to predict a number between 0.0 and 1.0 aka a probability.
For this to happen you need to change the following:

to self.fc4=nn.Linear(20,1)
and

to return x.view(-1, 1).squeeze(1)

This will give you logits as output, but they can be in a greater range than just 0 - 1.
So if you want the probability (0-1) instead of the logits you need to either add a nn.Sigmoid() layer to the end of your network or use nn.BCEWithLogitsLoss instead of nn.BCELoss.
Which is just BCELoss with a builtin sigmoid.
Using nn.BCEWithLogitsLoss is the recommended option. You can read about it here.

Tornike · April 27, 2021, 3:33pm

Thanks for your feedback.
I did what you advised and it worked.
But when i evaluate the model it always returns -1.somethings so the predicted value never gets 0.

self.fc4=nn.Linear(20,2)

i had that 2 because i wanted to return tensor of 2 probabilities of 2 classes and to return higher probability of these 2 as a prediction.
again, what you advised, it worked but it is always predicting that ‘1’ class

RaLo4 · April 27, 2021, 3:52pm

Like I wrote in the above answer, I just assumed you had 1 class because your target tensor is of size torch.Size([10]).
If you want to have two classes than this is were your problem lies and not with the model.
If you have 2 classes, the target tensor would have to be of size torch.Size([10, 2])

Just make sure you really want 2 classes. If you want a binary classification were the final output is either True or False, than a one dimensional output with a probability between 0.0 and 1.0 (which you get by putting the modles output into a sigmoid function) is what most people would use.

If you still want to have 2 classes, than your first model was totally fine as it was.
You just need your target to have 2 classes (be 2 dimensional) as well

Tornike · April 27, 2021, 4:23pm

Thanks a lot.
I’m new in deep learning so it is still a little bit confusing for me.
Just to make sure, classification should return either good ‘0’ and bad ‘1’ credit risk. I thought it was working like i said, but your answer makes more sense.

So i did like you said and it worked, but i have a very low accuracy. because it always returns one class. Maybe the problem is in dataset, i don’t know.

Anyways, Thanks for your time.

RaLo4 · April 29, 2021, 7:22am

No problem.

I just want to quickly summarize everything for you, in order to not confuse you.

Your network outputs, so called, logits, that are not in range 0-1.

These logits can be put into a 0-1 range by either passing them trough a nn.Sigmoid() layer at the end of your net, or by passing the logits through torch.sigmoid().

If you use nn.BCELoss as your loss, you need the nn.Sigmoid() layer at the end of your net. But it is recommended to use nn.BCEWithLogitsLoss as your loss and not have a nn.Sigmoid() layer at the end of your net.

When using the recommended nn.BCEWithLogitsLoss, torch.sigmoid() is used during testing/interference. Since nn.BCEWithLogitsLoss takes in logits you build your net to output logits, but during testing/interference we want the 0-1 probability so we pass the nets logits through torch.sigmoid().

Hope this helps

Tornike · April 29, 2021, 10:42am

I did like you recommended, with nn.BCEWithLogitsLoss and torch.sigmoid at testing and it is working like it is supposed.

Once again, Thank you very much for Great, deep explanation.

bbasaran · April 29, 2021, 7:59pm

Hello @RaLo4 , I have the same problem but could not solve with this. Would you help me, too? I summarized my network below basically:

class Predictor(pl.LightningModule):
  def __init__(self, n_features):
    super().__init__()
    self.model = LSTM(n_features)
    self.criterion = nn.BCELoss()

  def forward(self, x, labels=None): 
    output = self.model(x)
    loss = 0
    if labels is not None:
      loss = self.criterion(output, labels)
    return loss, output

class LSTM(nn.Module): 
  def __init__(self, n_features, n_hidden=256, n_layers=3):        
    super().__init__()
    self.lstm = nn.LSTM( ... )
    self.classifier = nn.Sigmoid()            
  
  def forward(self, x):
    self.lstm.flatten_parameters()
    _, (hidden, _) = self.lstm(x)
    output = hidden[-1]
    return self.classifier(output)

This architecture causes the same error (batch size = 64):

Using a target size (torch.Size([64])) that is different to the input size (torch.Size([64, 256])) is deprecated. Please ensure they have the same size.

RaLo4 · April 30, 2021, 10:18am

This is extreme difficult to answer since I do not know what your input looks like nor do I know what your desired output or labels look like.
But you can easily debug this yourself. All the error is telling you, is that your networks output should be the same size as your labels, but it currently is not.
Your labels are a one dimensional tensor of size [64] and your output is a two dimensional tensor of size [64, 256].
So you need to make sure that this output

Becomes the shape you want it to be (which is torch.Size([64]) guessing by looking at the labels shape).
You are using only the h_n output of your pytorchs lstm.

So I checked the documentation and for the shape of h_n is states:

h_n of shape (num_layers * num_directions, batch, hidden_size): tensor containing the hidden state for t = seq_len. If proj_size > 0 was specified, h_n shape will be (num_layers * num_directions, batch, proj_size).Like output, the layers can be separated using h_n.view(num_layers, num_directions, batch, hidden_size) and similarly for c_n.

You need to either adjust the network or change how you are handling the output (it depends on what you are trying to do) for it to work.
Debugging shapes is not do difficult. You can use something like print(output.shape) or print(output.size()) to print the shapes at different points in your code

bbasaran · May 1, 2021, 3:07pm

Thank you for the answer @RaLo4 ! As you have said, the problem is the inconsistency between the shapes of “hidden” and “labels”. Actually, I have just explained my problem here better: Problem with converting my LSTM multi-label classification model to a binary classification model
I have been trying to migrate my multi-class classification model to a binary classification model. In the former case, I was using “CrossEntropyLoss” and “Linear” Activation and there was no problem. As you can see now, problem arises when I alter them to “BCELoss” and “Sigmoid” Activation. Maybe instead of reshaping the output, I have to change the forward method of LSTM class from scratch, since it was written for a multi-class classification model?

RaLo4 · May 3, 2021, 7:05am

I have looked at the other post you linked.
Having read what you posted there, I think you were already on the right track.

I am not sure that what I am going to say now is “the final answer”, but concerning the shape issue, it would work if you combine your previous classifier

with your current one

So it would be something like:

self.classifier = nn.Sequential(nn.Linear(n_hidden, n_classes),
                                nn.Sigmoid())

With one more change!!!
Change the n_classes from 2 to 1. This might sound weird because you have two classes 0 and 1.
But as I explained in Tornikes Problem above, if you have a binary classification with labels being 0 and 1, you normally want your output to be a probability in range 0-1.

If this works you can further optimize it by taking out the nn.Sigmoid() and switch to nn.BCEWithLogitsLoss() as a loss function, as recommended by pytorch.
You can read about it above.

bbasaran · May 7, 2021, 11:29am

Thanks a lot for your detailed answer @RaLo4

In the last couple of days, I decided to leave my model as a multi-class classifier, because besides using the same dataset with classes only 0-1, I can diversify my dataset like classes 0-1-2… in the following days. As far as I understood, even if I build the model as a multi-class classifier, I can feed it with a dataset containing only 0-1 labels. I am planning to use again CrossEntropyLoss, but my only problem right now is changing Linear Activation to Softmax.