Hello,
as far as I know, a single linear layer with a single output neuron should work completely the same as linear regression. I am trying to achieve this, but I had no success so far.
I have the following code for initialization and data loading:
import h5py
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
import numpy as np
import torch
torch.set_default_tensor_type('torch.DoubleTensor')
filename = '../kernels/A600-1.h5'
f = h5py.File(filename, 'r')
X_train = f['kernels/train_kernel'][:,:]
y_train = f['vectors/train_vector'][:]
X_test = f['kernels/test_kernel'][:,:]
y_test = f['vectors/test_vector'][:]
f.close()
Then I compute linear regression results with the following code:
lin_model = LinearRegression(fit_intercept=False)
lin_model.fit(X_train, y_train)
y_predicted = lin_model.predict(X_test)
error = mean_absolute_error(y_test, y_predicted)
print(error)
For learning with PyTorch Linear layer I use this code:
X2_train = torch.Tensor(X_train).double()
y2_train = torch.Tensor(y_train.reshape(-1,1)).double()
X2_test = torch.Tensor(X_test).double()
y2_test = torch.Tensor(y_test.reshape(-1,1)).double()
model = torch.nn.Linear(600, 1, bias=False)
model.double()
criterion = torch.nn.MSELoss(reduction='elementwise_mean')
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
# optimizer = torch.optim.SGD(model.parameters(), lr=1e-8)
model.train()
for epoch in range(int(1e6)):
optimizer.zero_grad() # Reset gradients
model.zero_grad() # Just to be sure
y_predicted = model(X2_train) # Forward pass: predict
loss = criterion(y_predicted, y2_train) # Forward pass: calculate the loss
loss.backward() # Backpropagation: calculate the gradients
optimizer.step() # Update the weights
if epoch%1000==0:
print('Epoch: {} - loss: {}'.format(epoch, loss.item()))
model.eval()
model.double()
y_predicted2 = model(X_test).detach().numpy()
error2 = mean_absolute_error(y_test, y_predicted2)
print(error2)
I have also tried to use LBFGS optimizer, the code is similar:
X2_train = torch.Tensor(X_train).double()
y2_train = torch.Tensor(y_train.reshape(-1,1)).double()
X2_test = torch.Tensor(X_test).double()
y2_test = torch.Tensor(y_test.reshape(-1,1)).double()
model = torch.nn.Linear(600, 1, bias=False)
model.double()
criterion = torch.nn.MSELoss(reduction='elementwise_mean')
optimizer = torch.optim.LBFGS(model.parameters(), lr=1.0, max_iter=1000, history_size=10000)
model.train()
for epoch in range(int(3e2)):
def closure():
optimizer.zero_grad() # Reset gradients
model.zero_grad() # Just to be sure
y_predicted = model(X2_train) # Forward pass: predict
loss = criterion(y_predicted, y2_train) # Forward pass: calculate the loss
loss.backward() # Backpropagation: calculate the gradients
print('Epoch: {} - loss: {}'.format(epoch, loss.item()))
return loss
optimizer.step(closure) # Update the weights
model.eval()
model.double()
y_predicted2 = model(X_test).detach().numpy()
error2 = mean_absolute_error(y_test, y_predicted2)
print(error2)
I am using 600 weights for each, LinearRegression and Linear layer, both with no bias (I tried also with bias on but with no significant improvement), so they should work completely the same. Both are learning from the same data, 600 samples.
The analytical solution of LinearRegression has L1 test error 0.003666.
The Adam optimizer just runs for ages without converging to an analytical solution. The LBFGS converges to the test error 0.039252, which is almost ten times higher.
It is also possible to check weights with:
print(lin_model.coef_)
for p in model.parameters():
print(p)
They are different as I would expect if they converged to different minima. However, this is just a linear regression, it is a convex problem, so it should converge to the exactly same weights.
Can you see where the problem is?
My PyTorch version is 0.4.1.post2.
I can also send you training data with which I achieved these results.