Your use case mixes some workflows for a binary classification.
You could either:
- use two output units +
nn.CrossEntropyLoss
and a target of shape[batch_size]
containing the class indices - or a single output unit +
nn.BCEWithLogitsLoss
and a target of shape[batch_size, 1]
Neither use case uses a softmax
activation at the end, as both criteria will use an activation function internally, so you should remove the softmax.
That being said, the shape mismatch is probably created in:
x = x.view(-1, self._to_linear)
Could you use x = x.view(x.size(0), -1)
to keep the batch dimension constant.
This could potentially yield a shape mismatch in the feature dimension, which you would need to fix by changing the in_features
in the conflicting linear layer.