Using PyTorch nn.Sequential
model, I’m unable to learn all four representation of the XOR booleans:
import numpy as np
import torch
from torch import nn
from torch.autograd import Variable
from torch import FloatTensor
from torch import optim
use_cuda = torch.cuda.is_available()
X = xor_input = np.array([[0,0], [0,1], [1,0], [1,1]])
Y = xor_output = np.array([[0,1,1,0]]).T
# Converting the X to PyTorch-able data structure.
X_pt = Variable(FloatTensor(X))
X_pt = X_pt.cuda() if use_cuda else X_pt
# Converting the Y to PyTorch-able data structure.
Y_pt = Variable(FloatTensor(Y), requires_grad=False)
Y_pt = Y_pt.cuda() if use_cuda else Y_pt
hidden_dim = 5
model = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.Linear(hidden_dim, output_dim),
nn.Sigmoid())
criterion = nn.L1Loss()
learning_rate = 0.03
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
num_epochs = 10000
for _ in range(num_epochs):
predictions = model(X_pt)
loss_this_epoch = criterion(predictions, Y_pt)
loss_this_epoch.backward()
optimizer.step()
print([int(_pred > 0.5) for _pred in predictions], list(map(int, Y_pt)), loss_this_epoch.data[0])
After learning:
for _x, _y in zip(X_pt, Y_pt):
prediction = model(_x)
print('Input:\t', list(map(int, _x)))
print('Pred:\t', int(prediction))
print('Ouput:\t', int(_y))
print('######')
[out]:
Input: [0, 0]
Pred: 0
Ouput: 0
######
Input: [0, 1]
Pred: 1
Ouput: 1
######
Input: [1, 0]
Pred: 0
Ouput: 1
######
Input: [1, 1]
Pred: 0
Ouput: 0
######
I’ve tried running the same code over a couple of random seeds but it didn’t manage to learn all for XOR representation.
Without PyTorch, I could easily train a model with self-defined derivative functions and manually perform the backpropagation, see https://www.kaggle.io/svf/2342536/635025ecf1de59b71ea4fa03eb84f9f9/results.html#After-some-enlightenment
Why is it that the 2-layered MLP using PyTorch didn’t learn the XOR representation?