I’m using nn.parallel.data_parallel to train the model. When it’s single GPU, there is no problem; but when it’s multi-GPUs, there is problem “save_for_backward can only save input or output tensors, but argument 0 doesn’t satisfy this condition”. So did anyone confront this problem?
we should help better by seeing your code. You must have some custom autograd.Function which is not doing save_for_backward correctly.