Weights do not update

Hi, I’m new to pytorch.
I’m developing a weak learner for an ensable approach, I defined a weak learner as a neural network with just one neuron and one weigth:

class BaseLearnerFST(nn.Module):

    def __init__(self, leakyReluM: float = 0.02):
        super().__init__()
        # weight application
        self._weightApplier: Linear = Linear(1, 1, bias=False)
        # activation function
        self._leakyRelu: LeakyReLU = nn.LeakyReLU(negative_slope=leakyReluM)

    def forward(self, batch: Tensor) -> Tensor:
        weightedInput: Tensor = self._weightApplier(batch)
        return self._leakyRelu(weightedInput)

Given the fact that each sample in the dataset has a specific weight I developed a custom loss to handle the error computation:

class WeightedMeanSquaredError(nn.Module):

    def __init__(self):
        super().__init__()
        self._errorMap: list[Tensor] = []

    def forward(self, yTrue: Tensor, yPred: Tensor, weights: Tensor, save: bool = False) -> Tensor:
        modelLabel: Tensor = gt(yPred, 1.0).float()
        absolute_diff: Tensor = th.abs(modelLabel - yTrue)
        weighted_absolute_diff = weights * absolute_diff
        if save:
            self._errorMap.append(absolute_diff)

        return th.sum(weighted_absolute_diff, dtype=float32)

    def getErrorMap(self) -> list[Tensor]:
        return self._errorMap

I also developed a fit() method to train a BaseLearnerFST:

    def fit(
            self, xTrain: ndarray[float], yTrain: ndarray[int],
            weights: ndarray[float], batchSize: int = 1, epochs: int = 10
    ) -> Tensor:

        optimizer = SGD(self.parameters(), lr=0.05)  
        weightedLoss = WeightedMeanSquaredError()
        dataset: CustomDataset = CustomDataset(xTrain, yTrain, weights)
        batches: DataLoader = BaseLearnerFST.prepareDataset(dataset, batchSize)
        self.train()

        for epoch in range(epochs):
            for batch in batches:
                print("Weight: ", self._weightApplier.weight)
                xBatch, (yBatch, weightsBatch) = batch
                optimizer.zero_grad()  # initialize gradient to zero
                yPred: Tensor = self(xBatch)  # forward pass

                loss: Tensor = weightedLoss(
                    yBatch, yPred, weightsBatch,
                    save=True if epoch == epochs - 1 else False
                )

                loss.backward()  # compute gradient
                optimizer.step()  # backpropagation

            print(f"Epoch:{epoch} loss is {loss.item()}")

        return tensor([])  # th.cat(weightedLoss.getErrorMap())

    @staticmethod
    def prepareDataset(dataset: Dataset, batchSize: int = 1):
        return DataLoader(dataset, batchSize, shuffle=True)

The problem is that troughout the training process the single weight does not get updated, I know that the problem is inside the custom loss function that I made because, I tried to use a pytorch loss such as MSELoss and everything worked fine.

I read that if the training set tensor was not declared with the flag: requires_grad, the gradient do not get computed so here I attach also the class with which I handle the dataset before the training phase:

class CustomDataset(Dataset):
    def __init__(self, data: ndarray[float], labels: ndarray[int], weights: ndarray[float]):
        self._xTrain: Tensor = tensor(data, dtype=float32, requires_grad=True)
        self._yTrain: Tensor = tensor(labels, dtype=float32, requires_grad=True)
        self._weights: Tensor = tensor(weights, dtype=float32, requires_grad=True)

    def __len__(self) -> int:
        return len(self._xTrain)

    def __getitem__(self, idx: int) -> tuple[Tensor, tuple[Tensor, Tensor]]:
        return self._xTrain[idx], (self._yTrain[idx], self._weights[idx])

Hi Gio!

I assume that gt() is some sort of “greater-than” function that returns
a boolean (or perhaps integer) tensor that you then convert to float().
Pytorch’s autograd does not backpropagate through booleans (or integers),
because, being discrete, they are not usefully differentiable.

Do you really need to threshold yPred? Could you work directly with
yPred - yTrue? If you need to backpropagate through a thresholding
operation, you would have to use a differentiable, “soft” approximation
to a hard threshold, such as sigmoid().

Best.

K. Frank

1 Like

Thank you Frank,

I really appriciate your support, now I know what I was doing wrong.I’m probably going to stuck to your solution of just using directly yPred with a little bit of a approximation. I’m also probably going to try changing either the acitvaction function or the net architecture. One thing which is not directly a pytorch question is: if my classifier should just understand to specifically threshold value grater than 1, I really don’t care of how much beyond 1 or below we are, I just need to know if I’m over or under that threshold. Known that I also have to say that i’m trying to solve a binary classification problem (0: non face, 1: face) . So i’m worried about being to strict with a loss function like that. For example if I have a face which is just classified by my weak learner with 1.5, (where its ground truth is 1) and let’s suppose this instance of image has a weight of 0.4 then the loss function would be: 0.4*|1.5 - 1|. In this scenario my weak classifier performed well (because the output was g.t 1) but i’m still incrementing the loss.
What would you do ?

Hope you the best

skippa

Hi Skippa!

At inference time – that is, when you are not training, but just making
predictions – you (typically) want to come up with a yes-no answer. So
for inference, thresholding makes sense. One measure of how well you
are making predictions is the so-called accuracy, namely the percentage
of the predictions that are correct.

However, there is no need to use accuracy or some other metric similar
to accuracy as your training loss, and it is very common to use a training
loss that is quite different in character from an accuracy-like metric.

For a normal binary-classification problem the most common, and likely
the best, training loss to use is binary cross entropy. You, however, are
training very simple weak learners, so the common approach may or may
not be the best choice for your use case. Nevertheless, I would suggest
that you give pytorch’s binary_cross_entropy_with_logits() a try. (Note that
it has a weight argument, so you won’t have to write your own version to
apply your sample weights.)

To use binary_cross_entropy_with_logits(), you would pass it the output
of your BaseLearnerFST with no thresholding (or other further processing)
and your ground-truth 0.0 / 1.0 binary labels.

This is part of the reason I suggested binary_cross_entropy_with_logits().
This loss function takes predictions that are so-called logits and that range
from -inf to inf. A large negative logit corresponds to a probability of
being in the “1” class that is very close to zero, while a large positive logit
corresponds to a probability for the “1” class that is very close to one. And
a logit of 0.0 corresponds to a probability of one half. (Note that this means
that to produce a yes-no prediction, you would typically threshold a logit
against 0.0.)

Best.

K. Frank

1 Like