Need help with usage of weight with loss function

I am new to PyTorch and not 100% comfortable with the usage. I am trying a project to classify Supernova photometric data into two classes - Type 1a and Not Type 1a.

My data is collection of csv files that I am reading to dataloader for train and test set. My data is highly unbalanced with low number of NOT Type 1a. My labels are of size torch.Size([64, 2]) with [0, 1] for NOT type 1a and [1, 0] for Type 1a.

I am trying to use

import torch.optim as optim
rnn = model(input_size=12, output_size=2, hidden_dim=30, num_layers=2)
weights = torch.ones(63)
criterion = nn.CrossEntropyLoss(ignore_index=255, weight=torch.Tensor([0.7, 0.2]), reduction='mean')

optimizer = optim.SGD(rnn.parameters(), lr=0.001, momentum=0.9)

from tqdm import tqdm

for epoch in tqdm(range(5)):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        images, labels = data
        
        labels = labels.long()
        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs, hidden = rnn(images, None)

        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

and getting the following error

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [79], in <cell line: 3>()
     14 # forward + backward + optimize
     15 outputs, hidden = rnn(images, None)
---> 17 loss = criterion(outputs, labels)
     18 loss.backward()
     19 optimizer.step()

File ~/.conda/envs/datascience/lib/python3.9/site-packages/torch/nn/modules/module.py:1102, in Module._call_impl(self, *input, **kwargs)
   1098 # If we don't have any hooks, we want to skip the rest of the logic in
   1099 # this function, and just call forward.
   1100 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102     return forward_call(*input, **kwargs)
   1103 # Do not call functions when jit is used
   1104 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.conda/envs/datascience/lib/python3.9/site-packages/torch/nn/modules/loss.py:1150, in CrossEntropyLoss.forward(self, input, target)
   1149 def forward(self, input: Tensor, target: Tensor) -> Tensor:
-> 1150     return F.cross_entropy(input, target, weight=self.weight,
   1151                            ignore_index=self.ignore_index, reduction=self.reduction,
   1152                            label_smoothing=self.label_smoothing)

File ~/.conda/envs/datascience/lib/python3.9/site-packages/torch/nn/functional.py:2846, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)
   2844 if size_average is not None or reduce is not None:
   2845     reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2846 return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)

RuntimeError: weight tensor should be defined either for all or no classes

I am using following code to evaluate:

# since we're not training, we don't need to calculate the gradients for our outputs
accuracy = []
y_true = []
y_pred = []

with torch.no_grad():
    for data in testloader:
        images, labels = data
        # calculate outputs by running images through the network
        outputs, hidden = rnn(images, None)
        # the class with the highest energy is what we choose as prediction
        _, predicted = torch.max(outputs.data, 1)
        _, y_pred_tags = torch.max(predicted, dim = 1)  
        _, y_test_tag= torch.max(labels, dim = 1)
        
        # Collecting true and pred values
        y_true += y_test_tag
        y_pred += y_pred_tags
        
        correct_pred = (y_pred_tags == y_test_tag).float()
        acc = correct_pred.sum() / len(correct_pred)
        accuracy.append(torch.round(acc * 100))

print(f'Accuracy of the network on the 4263 test images: {np.round(np.mean(accuracy))} %')

I would a help with the error and would be highly grateful to the explanation.

Thanks

Hello, that sounds like a cool problem!

My labels are of size torch.Size([64, 2]) with [0, 1] for NOT type 1a and [1, 0] for Type 1a.

CrossEntropyLoss expects the targets/labels to be class indices, rather than one-hot encoded vectors. So, if I understand your setup correctly, you’ll want the type 1a label to be 0 and the NOT type 1a labels to be 1 (or vice-versa), rather than the vectors [0, 1] and [1, 0]. Try that and see if it fixes the error, might be as easy as doing this:

loss = criterion(outputs, torch.argmax(labels, dim=1))

It’s confusing why you’re passing ignore_index=255 to your optimizer, what’s the rationale there? If you only have 2 classes, the class 255 should never show up. This shouldn’t cause your code to malfunction but it just seems out of place.

Also, you may want to remove weights = torch.ones(63) to avoid confusion as it doesn’t seem to be used anywhere.

Btw an alternative way to deal with this problem is via over-sampling the NOT Type 1a class, via a weighted random sampler which would allow you to use a vanilla optimizer with no weights arg.

Thanks for the reply.
I understand using ignore_index=255 and weights = torch.ones(63) out of ignorance rather than requirement.

I tried using your suggestion and changed the

loss = criterion(outputs, torch.argmax(labels, dim=1))

It ended with another error. I think, I need to read more about the usage of RNN model before I reach to the point of troubleshooting.

I will try to educate myself on the given resource weighted random sampler, and will comeback for more help if needed.

Thanks a lot for your suggestion and time.

Thanks @Andrei_Cristea, it did solve the error completely, but it did not effect the confusion matrix result. Hence data imbalance is still dominating the training.

If any one can please suggest to deal with it.

Thanks

Just to check: you implemented weighted random sampler and confirmed that the training data you get is balanced between the two classes the way you want it?