Can I compare CUDA and CPU operations in the hook

Hey!
I want to achieve an accuracy comparison in the forward_hook

torch.nn.modules.module.register_module_forward_hook(hook_forward_fn)
torch.nn.modules.module.register_module_full_backward_hook(hook_backward_fn)

In each hook, let all the nn.Module runs once in the cuda device and run once in the CPU device

Then compare whether their results are consistent (In the allowable margin of tolerance)

Please someone guide me, how to do it, can I do it use hooks (register_module_forward_hook, register_module_full_backward_hook)

Here’s what I’m thinking: (probably all wrong):

# 1. deepcopy the module ,input, output in the hook.
module_copy = copy.deepcopy(module)
input_copy = copy.deepcopy(input)
output_copy = copy.deepcopy(output)

# 2. to cpu
module_copy = module_copy.to('cpu')

#  3. call the function , maybe like that
module_copy.call

# 4. compare the output_copy the output


This sounds OK, are you running into any issues?