Backward hook doesn't save with multiple GPUs

I have a module that is saving a value during a backward hook. Some math is done to grad_output, and then it is saved as self.Val = value in the backward hook function. With a single GPU this works fine. But when I run on multiple GPUs it tells me object has no attrinute ‘Val’. Specifically val should be keeping track of an average gradient over time. When I tried tracking this in the backward hook function itself it was giving me RuntimeError: binary_op(): expected both inputs to be on same device, but input a is on cuda:0 and input b is on cuda:1. Now I am trying to just save the gradient and perform the math later but that does not work with two GPUs.