Bias are not updating in decoder model

Anuj_Daga · May 12, 2020, 5:27pm

class Framework(nn.Module):
    def __init__(self):
        super(Framework, self).__init__()
        self.fc1 = nn.Linear(input_shape, 512)
        self.fc21 = nn.Linear(512, 128)
        self.fc22 = nn.Linear(512, 128)
        self.fc3 = nn.Linear(128, 512)
        self.fc4 = nn.Linear(512, input_shape)
        for m in self.modules():
            if isinstance(m, nn.Linear):
                torch.nn.init.xavier_uniform_(m.weight)
                torch.nn.init.zeros_(m.bias)

    def encoder(self, x):
        h1 = F.relu(self.fc1(x)+self.fc1.bias)
        return self.fc21(h1)+self.fc21.bias, F.elu(self.fc22(h1)+self.fc22.bias)+1
        

    def sampling(self, mu, var):
        eps= torch.nn.init.uniform_(var, a=0.0, b=1.0)
        return mu + eps*var

    def decoder(self, z):
        h3 = F.relu(self.fc3(z)+self.fc3.bias)
        #print(self.fc3.bias)
        return torch.sigmoid(self.fc4(h3)+self.fc4.bias)

    def forward(self, x):
        mu, var = self.encoder(x.view(-1, input_shape))
        z = self.sampling(mu, var)
        return self.decoder(z), mu, var

model1 = Framework().to(device)
model2 = Framework().to(device)
model3 = Framework().to(device)

import itertools
#deine optimizers
f_params=model1.parameters()
s_params=model2.parameters()
t_params=model3.parameters()
dvne_params=itertools.chain(f_params,s_params,t_params)
optimizer = optim.RMSprop(dvne_params, lr=learning_rate)

Anuj_Daga · May 12, 2020, 8:54pm

@ptrblck can you please help me with this?

zarkopafilis · May 12, 2020, 9:29pm

I understand your intuition, but pytorch linear layers use bias by default in all initialization, forward and backward pass phases. You don’t need to explicitly specify it.

Please see:
pytorch/linear.py

Anuj_Daga · May 12, 2020, 9:50pm

you why is the bias not updating in the fc3 and fc4 layers?
Do you think there is a conceptual problem or a coding problem here.

zarkopafilis · May 12, 2020, 9:52pm

Have you tried removing the +self.fc*.bias parts from the above snippet, or plotting the bias value every epoch?

Anuj_Daga · May 12, 2020, 9:55pm

yes,I have tried getting bias values after epoch hence fc1,fc2 are updating nicely.but fc3 and fc4 biases are not updating

ptrblck · May 13, 2020, 1:52am

The code looks generally alright.
Could you check the gradients of all parameters? Maybe you are dealing with vanishing gradients?

This dummy code snippet prints valid gradients for all parameters:

input_shape = 1
model = Framework()

x = torch.randn(1, 1)
out = model(x)
out[0].mean().backward()

for name, param in model.named_parameters():
    print(name, param.grad.abs().sum())
> fc1.weight tensor(1.2002)
fc1.bias tensor(5.8217)
fc21.weight tensor(8.7014)
fc21.bias tensor(2.8945)
fc22.weight tensor(8.8635)
fc22.bias tensor(2.9484)
fc3.weight tensor(177.9702)
fc3.bias tensor(7.6872)
fc4.weight tensor(15.0663)
fc4.bias tensor(0.4998)

Anuj_Daga · May 13, 2020, 8:49am

Sorry,I could not get how to solve the issue.
for me snippet looks something like this:
fc1.weight tensor(5.9704) fc1.bias tensor(4.6652) fc21.weight tensor(191.8801) fc21.bias tensor(1.5342) fc22.weight tensor(382.4296) fc22.bias tensor(3.0577) fc3.weight tensor(455.5275) fc3.bias tensor(3.5586) fc4.weight tensor(39.6243) fc4.bias tensor(0.2452)

ptrblck · May 14, 2020, 12:37am

Your output also shows, that fc3.bias and fc4.bias have valid gradients.
As long as you pass these parameters to the optimizer (e.g. via optim.SGD(model.parameters(), ...)), they will be updated.

How are you checking, if they are updated or not?

Anuj_Daga · May 14, 2020, 12:43am

I was checking by printing the model1.state_dict. but they did n’t updated.
Thanks again @ptrblck, I got the Solution just now.
I was missing Variable() in the above line:

eps = Variable(torch.randn_like(std))

I know the bias are updating now,but If possible,can you check if I am conceptually correct?

ptrblck · May 14, 2020, 1:07am

Variables are deprecated since PyTorch 0.4 so you should use tensors now.
It seems eps shouldn’t be updated, so you should be able to create it as a tensor without setting requires_grad=True.