My network threw an error during backprop because layer shapes did not match, and I wondered how/why the network was able to do the forward pass without throwing an error. After investigating, I’ve discovered some weird behavior.
If the model is on the CPU, all is well:
model = nn.Linear(100,1)
x = torch.randn(1,200)
model(x)
throws RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x200 and 100x1), as expected.