Unable to Learn XOR Representation using 2 layers of Multi-Layered Perceptron (MLP)

I think the main reason your model does not learn the XOR problem is, because it’s a linear model.
Try to add a non-linearity between both linear layers. Also, you could try to change the loss to nn.MSELoss, which might just learn better.
Have a look at this chapter from the deeplearning book.
Goodfellow et al. explain this problem pretty clear. Also they mention why a linear network won’t be able to learn the representation.

This code should work for you:

model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
                      nn.ReLU(),
                      nn.Linear(hidden_dim, output_dim),
                      nn.Sigmoid())
model.cuda()
criterion = nn.MSELoss()
learning_rate = 1e-3
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
num_epochs = 10000

You might test some seeds, since this problem is sometimes a bit sensitive to the initializations.