I think the main reason your model does not learn the XOR problem is, because it’s a linear model.
Try to add a non-linearity between both linear layers. Also, you could try to change the loss to nn.MSELoss, which might just learn better.
Have a look at this chapter from the deeplearning book.
Goodfellow et al. explain this problem pretty clear. Also they mention why a linear network won’t be able to learn the representation.
This code should work for you:
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
model.cuda()
criterion = nn.MSELoss()
learning_rate = 1e-3
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
num_epochs = 10000
You might test some seeds, since this problem is sometimes a bit sensitive to the initializations.