Your use case mixes some workflows for a binary classification.
You could either:
- use two output units +
nn.CrossEntropyLossand a target of shape[batch_size]containing the class indices - or a single output unit +
nn.BCEWithLogitsLossand a target of shape[batch_size, 1]
Neither use case uses a softmax activation at the end, as both criteria will use an activation function internally, so you should remove the softmax.
That being said, the shape mismatch is probably created in:
x = x.view(-1, self._to_linear)
Could you use x = x.view(x.size(0), -1) to keep the batch dimension constant.
This could potentially yield a shape mismatch in the feature dimension, which you would need to fix by changing the in_features in the conflicting linear layer.