Inplace Errors with Dropout layers with PyTorch 1.9, but not with PyTorch 1.10

I’m currently working with the 3detr repo(https://github.com/facebookresearch/3detr), and it is only officially working with Pytorch 1.9. When I switched to Pytorch 1.10, I got an error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [256, 1, 256]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True)

Turning on anomaly detection, I manage to trace errors to dropout layers, which I changed to work with Pytorch 1.10, like this :
self.dropout = nn.Dropout(dropout, inplace=False) # Inplace Originally True, set to False for Pytorch 1.10 Compatibility
Since I posting large sections of the 3dter repo won’t be easy to read(and I would have to post a lot of code), instead, I created a pull request on the 3detr repo with the changes: (Pull Request Here)
However, the dropout layers would returns errors on lines like this: src2 = self.linear2(self.dropout(self.activation(self.linear1(src2))))

My first question is what changed between Pytorch 1.9 and Pytorch 1.10 that would cause errors like this to occur in Pytorch 1.10, but not in 1.9?
My second question is however would I fix this errors without setting inplace to False(this is undesirable for me since it increases train time per batch by around 2x!) or rolling back to 1.9, and if this is not possible to do, is there any way to ignore the error Runtime Errors?

I would guess this PR changing the nn.ReLU backward pass might have disallowed the following dropout layer to manipulate the outputs inplace, as they are now used during the backward().

It’s strange to hear you are seeing a 2X slowdown. Could you give more information about the used device and setups, as I think the slowdown might be unrelated to the PR.

Of course, my hardware setup is a 6 core CPU(8400), and a 1060 GPU with 3 gb of vrams, so a tad bit limited in compute power and vram.

Any ideas on how to solve this issue will still running inplace=True? Switching inplace to False required me to scale down the model layers, since inplace reduced the vram usage by a decent amount for me, so that was nice to have. I could get rid of the dropout errors but that is a bit of a non ideal solution.

The corresponding PR was merged on Aug 26th, so you could compare the performance using the nightly binary from the day before and after the merge. If you are still seeing the issue, this could narrow it down to this PR.
It’s still strange that you needed to scale down the model layers (I assume you needed to reduce the memory usage), as the PR should save memory and naively I would assume the inplace dropout would save the same amount of memory (haven’t looked into the code deeply yet).

A short while ago I wrote a little blog about the nature of memory savings from inplace. The synopsis is that inplace will only save temporary allocations, so it should not matter. You might be running into the case mentioned in sidenote 2 that you wan to run gc.collect(). Another option could be to use the JIT to take the computation out of Python.

Best regards

Thomas

I see, I’m not quite sure how to install nightly build from a specific day, any guide/instructions on how this could be done?

Thank you for your advice, I’ll will try running gc.collect and see if that helps with memory, but would JIT make up for the 2x batch times I’m seeing? Personal I didn’t expect such a big difference with inplace versus no inplace.

The JIT might eliminate the need to use gc.collect() to free the memory. It won’t help with the timing. For that, I would suggest to use the profiler to see what exactly is getting slower. It is not clear to me that it is necessarily linked.

Best regards

Thomas