My network threw an error during backprop because layer shapes did not match, and I wondered how/why the network was able to do the forward pass without throwing an error. After investigating, I’ve discovered some weird behavior.
If the model is on the CPU, all is well:
model = nn.Linear(100,1) x = torch.randn(1,200) model(x)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x200 and 100x1), as expected.
However, if I put it on the GPU:
device = 'cuda:0' model.to(device) model(x.to(device))
it happily computes without throwing an error.
Is this expected behavior?