Adding dropout after training

Hi all,

I have a question for you. Do you know if it is possible to add dropout layers to an already trained neural network? I have a model trained without dropout, but I want to use dropout at inference time to estimate uncertainty.

I managed to copy the weights from my trained model into a model with dropout layers, but predictions are messed up. This is probably because in PyTorch when dropout is set in train mode it will divide the weights by 1-p.

Do you know if there is any way to deal with this issue? Or it is simply wrong to add dropout to an already trained model without retraining it?

Thanks in advance for your help

Indeed, training with dropout needs to account for scaling, so the strategy is to divide the weights by 1/p after training or multiply the weights by 1/p during training (I don’t know which one PyTorch uses).

If you need to apply Dropout during inference, you therefore need to compensate for the missing nodes in the network by dividing the weights by 1/p on the affected layers. It will obviously produce wrong results, but that is what you want to measure.

E.g: if p = 1/5, you multiply the weights by 5, and this simulates having all nodes active in the layer. See this answer for more details.

3 Likes

Exactly, what PyTorch does is to account for scaling at training time (multiplies the weights at training time by 1/(1-p_drop) and leaves weights untouched at test time).

It means that if I add dropout layers in between in my trained model and at test time I apply dropout as in train mode, PyTorch directly scales correctly the weights to account for missing nodes.

Now, my intuition is that the results are worse than what I expected because network weights were not trained to be “robust”, because dropout layers were not present during training. Hence, since by adding dropout at test time I get bad predictions, this means that my model is extremely uncertain about its predictions (I am using Monte Carlo dropout to estimate uncertainty).

I don’t know if you are familiar with this technique, in case do you think that my deduction is correct?

Thanks for your help :slight_smile:

1 Like

I haven’t used this approach in practice, but I can understand the procedure. I doubt you can measure confidence using dropout however, but you can evaluate the resilience to co-adaptation between neurons, and therefore the regularization of the network…

As to the actual procedure, I think you really need to scale by hand, because as you said, PyTorch doesn’t do anything with dropout at test time. Therefore, you will have fewer nodes in the layer, which won’t contribute as much to the next layer’s input as in the default network. This will produce very different results than the untouched network.

Note that it’s easier to use nn.functional.dropout if you want it active during evaluation, as model.eval() will deactivate the nn.Dropout layers…

1 Like