Dropout and inplace

Vidit_Agarwal · January 8, 2022, 5:52am

When we are using dropout what will be the difference in performance of our model if we change the inplace parameter to true instead of false. Both in terms of training and validating the model.

ptrblck · January 12, 2022, 5:47am

The inplace operation (assuming it’s allowed and doesn’t raise an error) would save the memory for the intermediate output activation, but would prevent potentially fusing this dropout layer with other layers if I’m not mistaken.

Vidit_Agarwal · January 12, 2022, 8:06am

Sorry didn’t got it can you please explain in terms of testing and validating? What does “Fusing” the dropout layer refers to?

ptrblck · January 12, 2022, 8:51am

Multiple operations can be fused if you are scripting the model via torch.jit.script. If an operation can be fused with their neighbors depends on the actual operations as well as the used fuser backend (e.g. nvfuser should be able to fuse dropout-add-relu etc. or other pointwise operations).

I’m not sure if you are referring to the training and validation loop, but inplace ops would work there in the same way with the benefit that Autograd would not complain of disallowed inplace manipulations during the validation run if the forward pass is wrapped in a torch.no_grad() context.