Hi there,
My problem is very simple, toy one, even, yet I struggle to find solution. I created 2D array X, shape N x D, of random inputs, then random vector b of size D. Output is y = X @ b, with no error term, simple matrix multiplication. I wanted my model to learn this linear dependency, but PyTorch refused to cooperate. Namely, it’s MSE doesn’t go below variance. I created completely analogous model in Keras and it worked well, just as expected.
Code is here:
import numpy as np
from keras.layers import Dense, Activation
from keras.models import Sequential
from keras.optimizers import SGD
from torch import nn, from_numpy
from torch.autograd import Variable
from torch import optim
from torch.nn.functional import mse_loss
X = np.random.randn(100, 5).astype(np.float32)
beta = np.random.randn(5).astype(np.float32)
y = X @ beta
tX = from_numpy(X)
ty = from_numpy(y)
keras_model = Sequential(layers=[Dense(input_shape=(5,), units=20, activation='relu'),
Dense(units=1)])
torch_model = nn.Sequential(nn.Linear(5, 20), nn.ReLU(), nn.Linear(20, 1))
opt = optim.SGD(torch_model.parameters(), lr=1e-3, momentum=0.8)
keras_model.compile(SGD(lr=1e-3, momentum=0.8), loss='mse')
ITERS = 100
for i in range(ITERS):
# torch_model.zero_grad()
loss = mse_loss(torch_model(tX), ty)
opt.zero_grad()
loss.backward()
opt.step()
print(mse_loss(torch_model(tX), ty))
print(y.var())
keras_model.fit(X, y, batch_size=X.shape[0], epochs=ITERS)
PyTorch version: 1.0.0
Keras version: 2.0.4
Everything trained on CPU. I didn’t add many more printers in this code, but turns out that weights are updating, albeit slightly.
Probably I’m overseeing something completely trivial, but, well, maybe I’m blind.