Is it intended behavior? Doesn’t it just have backportability? It’s possible to load cpu()'ed model’s state_dict using cuda()'ed model’s load_state_dict(), but backward is not, when cuda is not available (CUDA_VISIBLE_DEVICES=-1). I read torch.load() doc, and I understand that there may be some cuda-device-related information in state_dict. So I think that would be good nn.Module.state_dict() to have an option to convert all output tensors to cpu tensors when saving.
EDIT:
I tried cpu()-save()-cuda() way to resolve this problem, but I just found out that optimizer’s interface doesn’t have cpu() or cuda(). Then how can I resolve this?
My problem is, I just want to check if my model is learning correctly, but I’m getting out of memory error. CUDA_VISIBLE_DEVICES=-1 resolves OOM error, but raised another error.