Model weights are not updating but biases are

mihkell · June 23, 2018, 10:24am

I have a very simple model, where weights are not updating but biases are. Haven’t been able to figure out why. (full code here)

Model:

class MnistModel(nn.Module):
    def __init__(self, batch_size):
        super(MnistModel, self).__init__()
        self.batch_size = batch_size
        self.w = torch.nn.Parameter(torch.empty(batch_size, 784, 10).uniform_(0, 1))
        self.b = torch.nn.Parameter(torch.empty(10).uniform_(0, 1))

    def forward(self, x):
        return torch.bmm(x, self.w) + self.b

and loop:

for raw_data, raw_target in train_loader:
    data = raw_data.view((batch_size, 1, 784))

    logits = model(data)
    zeros[:, :, :] = 0
    zeros[rows, 0, raw_target] = 1
    loss = criterion(logits, zeros.float())

    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

when you run this code it will show( biase and weights are taken from model.parameters()):

weight[0][0][0] before training: tensor(0.7576)
Bias[0] before training: tensor(0.3681)

weight[0][0][0] after training:  tensor(0.7576)
Bias[0] after training tensor(-48.1940)

mihkell · June 23, 2018, 11:15am

Has to be something very basic, because I’ve ran out of ideas…

mteser · June 23, 2018, 1:17pm

I think some of the weights are updating. When I sum all of the weights I get another value after training than I got before training, but it is still very close to the first one.

sum(weight) before training: tensor(1.00000e+06 *
       3.9201)
Bias[0] before training: tensor(0.3681)

sum(weight) after training:  tensor(1.00000e+06 *
       3.8118)
Bias[0] after training tensor(-47.6196)

mteser · June 23, 2018, 1:55pm

I am not absolutely sure in this point, but do you really want to have the batch_size as one Dimension of your weight w?
Shouldn’t the weight be the same for every element in the batch and you just expand it in the forward computation like this:

class MnistModel(nn.Module):
    def __init__(self, batch_size):
        super(MnistModel, self).__init__()
        self.w = torch.nn.Parameter(torch.empty(784, 10).uniform_(0, 1))
        self.b = torch.nn.Parameter(torch.empty(10).uniform_(0, 1))

    def forward(self, x):
        exp_w = self.w.expand(x.size(0), self.w.size(0), self.w.size(1))
        return torch.bmm(x, exp_w) + self.b

?

mihkell · June 24, 2018, 5:17am

@mteser that is true indeed. =)

mihkell · June 24, 2018, 10:23am

With your pointers and changing optimizer to torch.optim.Adam I was able to make weights change a little more.