When we are using dropout what will be the difference in performance of our model if we change the inplace parameter to true instead of false. Both in terms of training and validating the model.
The inplace operation (assuming it’s allowed and doesn’t raise an error) would save the memory for the intermediate output activation, but would prevent potentially fusing this dropout layer with other layers if I’m not mistaken.
Sorry didn’t got it can you please explain in terms of testing and validating? What does “Fusing” the dropout layer refers to?
Multiple operations can be fused if you are scripting the model via
torch.jit.script. If an operation can be fused with their neighbors depends on the actual operations as well as the used fuser backend (e.g.
nvfuser should be able to fuse dropout-add-relu etc. or other pointwise operations).
I’m not sure if you are referring to the training and validation loop, but inplace ops would work there in the same way with the benefit that Autograd would not complain of disallowed inplace manipulations during the validation run if the forward pass is wrapped in a