At some point in my network I have 3 tensors (x1, x2, x3) that I would like to sum with a learned weighting scheme. Although trying to be explicit that my ‘host_weights’ vector of 3 weights requires a gradient it doesn’t seem to get updated at all:
def __init__(self, **kwargs):
....
self.register_parameter(name='host_weights', param=nn.Parameter(torch.randn(3, requires_grad=True)))
def forward(self, x1, x2, x3):
x1 = self.resnet_x1(x1)
x2 = self.resnet_x2(x2)
x3 = self.resnet_x3(x3)
weights_scaled = F.softmax(self.host_weights, dim=0)
print(self.host_weights)
x = weights_scaled[0]*x1 + weights_scaled[1]*x2 + weights_scaled[2]*x3
...
However:
Parameter containing:
tensor([-0.8687, -0.4497, -0.9619], device=‘cuda:0’, requires_grad=True)
Parameter containing:
tensor([-0.8687, -0.4497, -0.9619], device=‘cuda:0’, requires_grad=True)
Parameter containing:
tensor([-0.8687, -0.4497, -0.9619], device=‘cuda:0’, requires_grad=True)
Parameter containing:
tensor([-0.8687, -0.4497, -0.9619], device=‘cuda:0’, requires_grad=True)
I’m not sure why however, could the indexing be creating a break in autograd?