Weird behavior of the output layer

hyunwookim · September 5, 2021, 1:43pm

Hi, I am trying to initialize my output layer so that it yields a differential identity transformation grid, so I initialized the weights of my final Linear to be 0 and bias to the identity transformation grid, but running images on the initialized model seems to yield weird results.
Here is my model and the initial bias, weight, and output of the first iteration. Shouldn’t the output be identical to the bias if all weights are initialized to 0? Thanks!

class Net(nn.Module):

def __init__(self, grid_size):

  super().__init__()

  self.conv = get_conv(grid_size).to(DEVICE)

  self.flatten = nn.Flatten().to(DEVICE)

  self.linear1 = nn.Sequential(nn.Linear(80,20),nn.ReLU(),).to(DEVICE)

  self.linear2 = nn.Linear(20, 2*grid_size*grid_size).to(DEVICE)

  self.linear2.bias = nn.Parameter(init_grid(grid_size).view(-1)).to(DEVICE)

  self.linear2.weights = torch.empty(2*grid_size*grid_size).fill_(float(0)).to(DEVICE)


def forward(self, x):

    x = self.conv(x)

    x = self.flatten(x)

    x = self.linear1(x)

    x = self.linear2(x)

    return x

bias of the model: Parameter containing:
tensor([0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857], device=‘cuda:0’, requires_grad=True)
weight of the model: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.], device=‘cuda:0’)
output of first iter: tensor([[0.2810, 0.2688, 0.2780, …, 0.2805, 0.3007, 0.3204],
[0.2841, 0.2690, 0.2759, …, 0.2849, 0.3014, 0.3178],
[0.2862, 0.2699, 0.2783, …, 0.2825, 0.3005, 0.3211],
…,
[0.2856, 0.2715, 0.2779, …, 0.2824, 0.2960, 0.3212],
[0.2843, 0.2690, 0.2798, …, 0.2816, 0.3005, 0.3203],
[0.2856, 0.2733, 0.2791, …, 0.2846, 0.3015, 0.3240]],

ptrblck · September 6, 2021, 8:35am

You are creating a new attribute using self.linear2.weights = ..., since the weight parameter is stored as self.linear2.weight (note the missing s).

hyunwookim · September 6, 2021, 10:17am

Oh that definitely works. Thanks a lot ptrblck!