Linear regression, PyTorch: MSE ~ Variance, Keras: 0 ~ MSE << Variance

Hi there,

My problem is very simple, toy one, even, yet I struggle to find solution. I created 2D array X, shape N x D, of random inputs, then random vector b of size D. Output is y = X @ b, with no error term, simple matrix multiplication. I wanted my model to learn this linear dependency, but PyTorch refused to cooperate. Namely, it’s MSE doesn’t go below variance. I created completely analogous model in Keras and it worked well, just as expected.

Code is here:

import numpy as np
from keras.layers import Dense, Activation
from keras.models import Sequential
from keras.optimizers import SGD

from torch import nn, from_numpy
from torch.autograd import Variable
from torch import optim
from torch.nn.functional import mse_loss

X = np.random.randn(100, 5).astype(np.float32)
beta = np.random.randn(5).astype(np.float32)

y = X @ beta

tX = from_numpy(X)
ty = from_numpy(y)

keras_model = Sequential(layers=[Dense(input_shape=(5,), units=20, activation='relu'), 
torch_model = nn.Sequential(nn.Linear(5, 20), nn.ReLU(), nn.Linear(20, 1))

opt = optim.SGD(torch_model.parameters(), lr=1e-3, momentum=0.8)
keras_model.compile(SGD(lr=1e-3, momentum=0.8), loss='mse')

ITERS = 100

for i in range(ITERS):
#     torch_model.zero_grad()
    loss = mse_loss(torch_model(tX), ty)

print(mse_loss(torch_model(tX), ty))

print(y.var()), y, batch_size=X.shape[0], epochs=ITERS)

PyTorch version: 1.0.0
Keras version: 2.0.4

Everything trained on CPU. I didn’t add many more printers in this code, but turns out that weights are updating, albeit slightly.

Probably I’m overseeing something completely trivial, but, well, maybe I’m blind.

It looks like the shape of your target ty might be wrong.
Currently it has the shape [100], while your model’s output is [100, 1].
This means that internally your target will be broadcasted such that the operation
(torch_model(tX) - ty) will yield a tensor of shape [100, 100].
This is most likely wrong.
Try to add dim1 to your target using ty = ty.unsqueeze(1) before passing it to your loss and your model should work.
Let me know, if that helps.

1 Like

It did! MSE dives significantly below variance, but it’s still worse than Keras (ceteris paribus, just as in the OP). Is it expected?

Nonetheless, thank you!