Hi, I am trying to initialize my output layer so that it yields a differential identity transformation grid, so I initialized the weights of my final Linear to be 0 and bias to the identity transformation grid, but running images on the initialized model seems to yield weird results.
Here is my model and the initial bias, weight, and output of the first iteration. Shouldn’t the output be identical to the bias if all weights are initialized to 0? Thanks!
class Net(nn.Module):
def __init__(self, grid_size):
super().__init__()
self.conv = get_conv(grid_size).to(DEVICE)
self.flatten = nn.Flatten().to(DEVICE)
self.linear1 = nn.Sequential(nn.Linear(80,20),nn.ReLU(),).to(DEVICE)
self.linear2 = nn.Linear(20, 2*grid_size*grid_size).to(DEVICE)
self.linear2.bias = nn.Parameter(init_grid(grid_size).view(-1)).to(DEVICE)
self.linear2.weights = torch.empty(2*grid_size*grid_size).fill_(float(0)).to(DEVICE)
def forward(self, x):
x = self.conv(x)
x = self.flatten(x)
x = self.linear1(x)
x = self.linear2(x)
return x
bias of the model: Parameter containing:
tensor([0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857, 0.2857,
0.2857, 0.2857], device=‘cuda:0’, requires_grad=True)
weight of the model: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.], device=‘cuda:0’)
output of first iter: tensor([[0.2810, 0.2688, 0.2780, …, 0.2805, 0.3007, 0.3204],
[0.2841, 0.2690, 0.2759, …, 0.2849, 0.3014, 0.3178],
[0.2862, 0.2699, 0.2783, …, 0.2825, 0.3005, 0.3211],
…,
[0.2856, 0.2715, 0.2779, …, 0.2824, 0.2960, 0.3212],
[0.2843, 0.2690, 0.2798, …, 0.2816, 0.3005, 0.3203],
[0.2856, 0.2733, 0.2791, …, 0.2846, 0.3015, 0.3240]],