Hi Richard!
As Prashanth notes, you could use BCELoss
in place of
CrossEntropyLoss
.
However, you’ll be better off removing the torch.sigmoid()
and using BCEWithLogitsLoss
. Doing so will be mathematically
the same, but numerically more stable.
Thus:
class Classifier(nn.Module):
def __init__(self, in_dim ):
super(Classifier, self).__init__()
self.classify = nn.Linear(in_dim , 1 )
def forward(self, features ):
final = self.classify(features)
return final
and:
criterion = nn.BCEWithLogitsLoss()
Your Classifier
will now output raw-score logits that range from
-inf
to inf
instead of probabilities. Should you need probabilities
for subsequent processing, you can always pass the logits through
sigmoid()
. Note, you don’t need probabilities to make hard 0-1
predictions: prediction = 1 if logit > 0.0
is the same as
prediction = 1 if probability > 0.5
.
Two side comments:
As written, you never call scheduler.step()
so scheduler
doesn’t
do anything.
For getting started with the code, one Linear
layer is fine, but it
won’t be much of a classifier for anything but special toy problems.
Leaving aside the sigmoid()
, your single output is just a linear
function of your in_dim
inputs. Things already get much more
interesting (and useful) if you add a single “hidden” layer:
class Classifier (nn.Module):
def __init__ (self, in_dim, hidden_dim):
super (Classifier, self).__init__()
self.fc1 = nn.Linear (in_dim, hidden_dim)
self.activation = nn.ReLU() # for example
self.fc2 = nn.Linear (hidden_dim, 1)
def forward (self, features):
x = self.fc1 (features)
x = self.activation (x)
x = self.fc2 (x)
return x
For more interesting classification tasks, the non-linear activation
(for example, ReLU
) between fc1
and fc2
is the “secret sauce.”
Best.
K. Frank