Strange behaviour of linear layer

Hello community, I’m coming from mxnet so rather new to pytorch. I am a bit surprised by the nn.Linear layer. For testing purposes I did a simple linear regression and it works perfectly well when using tensors, yet when I replace the tensor approach by using directly the Linear layer it fails.
More specifically, I try to fit a simple linear relation y = x * beta. When using tensors it works fine (note I do not need to code a class here but just to be similar to Linear net) but fails when I replace the tensor by a nn.Linear layer

Tensor approach

# various constants data and learning
nb_dim = 5
sample_size = 300
noise_level = 0.0001
lr = 0.05
num_epochs = 50

# dummy data for test
x = np.random.normal(size=(sample_size, nb_dim))
beta = 10 * (np.random.rand(nb_dim) - 0.5)
noise = np.random.rand(sample_size)
yn =, beta) + noise_level * noise

# numpy -> torch tensor
xn_ = torch.from_numpy(x).float()
yn_ = torch.from_numpy(yn).float()

# linear net y = x * beta
class Net(torch.nn.Module):

    def __init__(self, dim_in):
        super(Net, self).__init__()
        self.layer = torch.autograd.Variable(torch.randn(dim_in), requires_grad=True)

    def forward(self, v):
        return torch.matmul(v, self.layer)
    def parameters(self):
        return [self.layer]

net = Net(len(beta))
loss = torch.nn.MSELoss()
optimizer = torch.optim.SGD(net.parameters(), lr=lr)

losses = []
for e in range(num_epochs):
    output = net(xn_)
    l = loss(output, yn_)

And it works fine
losses are going to 0 and I find back the original beta with a small error.

Using Linear Layer

Yet if I change the Net class and use a Linear layer instead

class Net(torch.nn.Module):

    def __init__(self, dim_in):
        super(Net, self).__init__()
        self.layer = torch.nn.Linear(dim_in, 1, bias=False)

    def forward(self, v):
        return self.layer(v)

then the losses does not converge to 0 and the beta is far from the real value

I am sure I am doing something incorrect but I can’t see where ?
Thanks a lot for your help

The problem is with your shapes.
Currently you are just using tensors of the shape [batch_size] for the output and target.
This works fine in your manual approach, as you are just defining the weight matrix as torch.randn(dim_in) (the output dimension is missing).

However, if you use the nn.Linear module, the output will have a shape of [batch_size, out_features].
In your case it will be [300, 1].

Now, if you calculate the MSELoss of the output ([300, 1]) and target ([300]), an internal broadcasting will happen, which is unwanted at this point.

This issue is being tracked here.

You can easily fix this by unsqueezing dim1 in your target for the nn.Linear approach:

l = loss(output, yn_.unsqueeze(1))

Also note that Variables are deprecated since PyTorch 0.4.0. If you are using a newer version, you can just create tensors with requires_grad=True.


Great thanks a lot for your help much appreciated