Binary classification

bigtree · June 12, 2021, 6:52am

Hi
I am trying to build a (people and background) segmentation network.
I have a working network but for distance estimation. I plan to change this a little bit for the segmetation woor.
Here is my distance model:


class network(nn.Module):
    distance estimation layers
                     ...
    distance estimation layers
       self.scratch.output_conv = nn.Sequential(
            nn.Conv2d(features, features//2, kernel_size=3, stride=1, padding=1, groups=self.groups),
            Interpolate(scale_factor=2, mode="bilinear"),
            nn.Conv2d(features//2, 32, kernel_size=3, stride=1, padding=1),
            self.scratch.activation,
            nn.Conv2d(32, 1, kernel_size=1, stride=1, padding=0),
            nn.ReLU(True) if non_negative else nn.Identity(),
            nn.Identity(),
        )
       def forward(self, x):
            distance estimation layers
                      .....
             output = self.scratch.output_conv(...)
            return output

since the output is a dense map.
So to output either 0 or 1 for each pixel, I made the following change (Add Sigmoid and binary the output):

class network(nn.Module):
    distance estimation layers
                     ...
    distance estimation layers
       self.scratch.output_conv = nn.Sequential(
            nn.Conv2d(features, features//2, kernel_size=3, stride=1, padding=1, groups=self.groups),
            Interpolate(scale_factor=2, mode="bilinear"),
            nn.Conv2d(features//2, 32, kernel_size=3, stride=1, padding=1),
            self.scratch.activation,
            nn.Conv2d(32, 1, kernel_size=1, stride=1, padding=0),
            nn.ReLU(True) if non_negative else nn.Identity(),
            nn.Identity(),
            nn.Sigmoid()
        )
       def forward(self, x):
            distance estimation layers
                      .....
             output = self.scratch.output_conv(...)
            pred = out > 0.75
            pred = pred * 1.0
        return pred

During training, I use the loss function:

loss_f= nn.BCELoss()
est= model(image)
loss = loss_f(est, target)
loss.backward()

When run I got

  loss.backward()
  File "/home/bigtree/miniconda3/envs/bigtree/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/bigtree/miniconda3/envs/bigtree/lib/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Please let me know if I did something wrong.
Thanks

KFrank · June 12, 2021, 1:44pm

Hi Bigtree!

bigtree:

So to output either 0 or 1 for each pixel, I made the following change (Add Sigmoid and binary the output):
             output = self.scratch.output_conv(...)
            pred = out > 0.75

This is your problem – the thresholding step is not usefully
differentiable, so it breaks the computation graph, leading
to your error.

It is perfectly appropriate (and in fact necessary) to train with
continuous predictions that have not been thresholded (or otherwise
converted to a discrete result).

As an aside, you a better off, for reasons of numerical stability,
using BCEWithLogitsLoss (and no final Sigmoid).

Best.

K. Frank

bigtree · June 12, 2021, 5:13pm

@KFrank
Thank you very much for your prompt reply.
Then how about the inference.
Without Sigmoid, the out will be a random floating number may out of range [0, 1].
what should do to regulate the out to either 0 or 1 for each pixel when doing inference?

Thanks,

KFrank · June 13, 2021, 2:55am

Hi Bigtree!

Well, it won’t be a random floating-point number. It will still be the
prediction of the network, but expressed as a raw-score logit that
runs from -inf to inf, rather than a probability between 0 and 1.

The question is how to convert a probabilistic prediction (whether
expressed as a logit or a probability) to a hard 0-1 prediction.

If it were a probability, we would typically threshold the probability
against 0.5 (e.g., p < 0.5 means a hard prediction of 0, while
p > 0.5 means a hard prediction of 1). This is equivalent to
thresholding the logit against 0.0: logit < 0.0 means 0 and
logit > 0.0 means 1. This is because a logit of 0.0
corresponds to a probability of 0.5, i.e. sigmoid (0.0) = 0.5.

In your case:

you choose to threshold your probability against 0.75. (There’s
nothing wrong with doing this.) This corresponds to thresholding
your logit against log (0.75 / (1 - 0.75)) = 1.099 (which
is to say that sigmoid (1.099) = 0.75).

That is, leave out the Sigmoid (so that out is a logit), and calculate
your hard 0-1 prediction as

pred = out > log (0.75 / (1 - 0.75))

Equivalently you could pass out through sigmoid() after
calculating your loss with BCEWithLogitsLoss, and threshold
the result against 0.75:

pred = sigmoid (out) > 0.75

Best.

K. Frank

bigtree · June 13, 2021, 4:22pm

@KFrank
Yes, this works.
Thank you very much for the explanation.
Really appreciate that.