When I use one GPU, 'CUDA memory out'
occurs.
so I add one more GPU with nn.DataParallel(model)
but still memory out occurs.
I searched google about this, then found output_device settings like
model = nn.DataParallel(model, output_device=1)
.
model2 = nn.DataParallel(model2, output_device=1)
.
but I got a runtime error.
How can I solve it except decreasing batch size ?
RuntimeError: Assertion `THCTensor_(checkGPU)(state, 4, input, target, output, total_weight)’ failed. Some of weight/gradient/input tensors are located on different GPUs. Please move them to a single one. at /pytorch/aten/src/THCUNN/generic/ClassNLLCriterion.cu:28