Creating weighted-combo tensor

ilkarman · August 26, 2020, 9:34am

At some point in my network I have 3 tensors (x1, x2, x3) that I would like to sum with a learned weighting scheme. Although trying to be explicit that my ‘host_weights’ vector of 3 weights requires a gradient it doesn’t seem to get updated at all:

    def __init__(self, **kwargs):
        ....                      
        self.register_parameter(name='host_weights', param=nn.Parameter(torch.randn(3, requires_grad=True)))
        

    def forward(self, x1, x2, x3):

        x1 = self.resnet_x1(x1)
        x2 = self.resnet_x2(x2)
        x3 = self.resnet_x3(x3)
        
        weights_scaled = F.softmax(self.host_weights, dim=0)
        print(self.host_weights)
        x = weights_scaled[0]*x1 + weights_scaled[1]*x2 + weights_scaled[2]*x3
        ...

However:

Parameter containing:
tensor([-0.8687, -0.4497, -0.9619], device=‘cuda:0’, requires_grad=True)
Parameter containing:
tensor([-0.8687, -0.4497, -0.9619], device=‘cuda:0’, requires_grad=True)
Parameter containing:
tensor([-0.8687, -0.4497, -0.9619], device=‘cuda:0’, requires_grad=True)
Parameter containing:
tensor([-0.8687, -0.4497, -0.9619], device=‘cuda:0’, requires_grad=True)

I’m not sure why however, could the indexing be creating a break in autograd?

mariosasko · August 26, 2020, 12:06pm

It should’t break the computation graph. Applying softmax and then indexing the weights is a valid sequence of ops. Additional code would be helpful (the whole forward pass and training loop) to solve the issue.

ilkarman · August 26, 2020, 4:24pm

Ah it was a silly error elsewhere where optimizer. zero_grad() was above optimizer. step() killing my gradient!