Multi GPU Hook not correctly filling buffer

Coming from Multi GPU backwards hook on wrong device, that fixed it so it didnt error anymore, but it is not actually returning the same value with 1 GPU and 2. I was informed in Initializing a member tensor after creation with DataParallel (repost) that I shouldn’t be using register_backwards_hooks at all, so now I am doing it with a custom function. I think this is all the relevant code:

My values tracker looks like

class valueTracker(nn.Module):
    def __init__(self, out_channels):
        super(valueTracker, self).__init__()
        self.register_buffer('average', torch.zeros(out_channels, device=gf.device, dtype=torch.double))

Then in my module

    #in init
        self.values = nn.ModuleList([])
        self.values.append(valueTracker(self.out_channels))
    ...
    def forward(self,x):
        ...
        out = saveAverageD(out, self.values)
        return out

and the original backward hook that is now a function:

def saveAverageD(inp, Values):
    class Saver(torch.autograd.Function):
        @staticmethod
        def forward(ctx, inp):
            return inp
        @staticmethod
        def backward(ctx, grad_out):
            #during n phase only one set of values needs to be tracked so save in 0 even if there are multiple candidates
            with torch.no_grad():
                Values[0].average = Values[0].average * 0.99 + grad_out.sum((0,2,3)) * 0.01
                #I also tried with :   Values[0].average = Values[0].average * 0.99 + grad_out.to(Values[0].average.device).sum((0,2,3)) * 0.01

    return Saver.apply(inp)

I’ve written a thorough value checker that runs over 10 epochs and prints every value I can think of. With 1 gpu this does work exactly the same way as the register_backward_hook method. But with 2 GPUs the one value that changes is that this average does not remain consistent. My value checker prints the weights at every batch and they stay the same which means the grad being calculated must be the same.

Actually, that was causing a memory leak. I switched to a register_hook instead which also passes my value checker but does not work on 2 GPUs. Code for that is as follows:

class valueTracker(nn.Module):
    def __init__(self, out_channels):
        super(valueTracker, self).__init__()
        self.register_buffer('average', torch.zeros(out_channels, device=gf.device, dtype=torch.double))

Then in my module

    #in init
        self.values = nn.ModuleList([])
        self.values.append(valueTracker(self.out_channels))
    ...
    def forward(self,x):
        ...
        out.register_hook(lambda grad: saveAverageD(grad, self.values, anotherTensor))
        return out

and the original module backward hook that is now a tensor hook:

def saveAverageD(grad_out, Values, additonalTensor):
    with torch.no_grad():
        Values[0].average = Values[0].average * 0.99 + grad_out.sum((0,2,3)) * 0.01

Never got a reply on this. Just replying for bump