I have an error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper__native_layer_norm_backward)
I’m trying to find what’s not on the GPU. The forward pass runs with no error, this message appear only at the backward pass. I’ve checked that all my data & the model are on the GPU. Also that I have initialised the optimiser with the parameters on the device. I don’t see what could be that is not on the GPU. I’ve tried to check all the device of all the tensors in the grad_fn tree with [getBack() function](https://stackoverflow.com/questions/52988876/how-can-i-visualize-what-happens-during-loss-backward)
but this is way too slow (too many steps). What else can I do to find where the issue is?