How to investigate a not on the same device error

I have an error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper__native_layer_norm_backward)

I’m trying to find what’s not on the GPU. The forward pass runs with no error, this message appear only at the backward pass. I’ve checked that all my data & the model are on the GPU. Also that I have initialised the optimiser with the parameters on the device. I don’t see what could be that is not on the GPU. I’ve tried to check all the device of all the tensors in the grad_fn tree with [getBack() function](https://stackoverflow.com/questions/52988876/how-can-i-visualize-what-happens-during-loss-backward) but this is way too slow (too many steps). What else can I do to find where the issue is?

Based on the stacktrace it seems the weight of a LayerNorm layer is not on the same device as the gradient input to it. I would probably try to isolate this layer and check where the inputs/outputs are coming from. A minimal code snippet to reproduce the issue would be great for further debugging.

1 Like

It’s a layer in a pretrained transformer model. Possibly the last layer.

I’ve checked the layers one by one and all have been loaded in the GPU. I will try to cobble up some code snippet with the error.

Thanks!

Once you’ve narrowed down the layers, add print statements to check the .device attribute of the input activation as well as the layer parameters. Forward hooks might work for this check.