I implemented memory-efficient DenseNet for PyTorch v0.4.0 and it works fine for single GPU. However, it fails in the multi-GPUs case. The error occurs during the backward process of nn.DataParallel
model:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation
I guess the modifications of intermediate variables with shared storage cause this error, but I cannot locate the inplace operation since all the inplace operations work fine for single-gpu case. Does nn.DataParallel
do additional gradient checking when backward?
Besides, I run the similar implementation with Pytorch v0.3.1 and it works well for nn.DataParallel
.
I opened an issue for the project. The code can be tested.