My problem is very simple, toy one, even, yet I struggle to find solution. I created 2D array X, shape N x D, of random inputs, then random vector b of size D. Output is y = X @ b, with no error term, simple matrix multiplication. I wanted my model to learn this linear dependency, but PyTorch refused to cooperate. Namely, it’s MSE doesn’t go below variance. I created completely analogous model in Keras and it worked well, just as expected.
Code is here:
import numpy as np from keras.layers import Dense, Activation from keras.models import Sequential from keras.optimizers import SGD from torch import nn, from_numpy from torch.autograd import Variable from torch import optim from torch.nn.functional import mse_loss X = np.random.randn(100, 5).astype(np.float32) beta = np.random.randn(5).astype(np.float32) y = X @ beta tX = from_numpy(X) ty = from_numpy(y) keras_model = Sequential(layers=[Dense(input_shape=(5,), units=20, activation='relu'), Dense(units=1)]) torch_model = nn.Sequential(nn.Linear(5, 20), nn.ReLU(), nn.Linear(20, 1)) opt = optim.SGD(torch_model.parameters(), lr=1e-3, momentum=0.8) keras_model.compile(SGD(lr=1e-3, momentum=0.8), loss='mse') ITERS = 100 for i in range(ITERS): # torch_model.zero_grad() loss = mse_loss(torch_model(tX), ty) opt.zero_grad() loss.backward() opt.step() print(mse_loss(torch_model(tX), ty)) print(y.var()) keras_model.fit(X, y, batch_size=X.shape, epochs=ITERS)
PyTorch version: 1.0.0
Keras version: 2.0.4
Everything trained on CPU. I didn’t add many more printers in this code, but turns out that weights are updating, albeit slightly.
Probably I’m overseeing something completely trivial, but, well, maybe I’m blind.