Batch norm and dropout


I’m playing with the MC dropout (Yarin Gal) idea which inserts a dropout layer after every weight layer. But for many pretrained models like ResNet, they are using BatchNorm instead of dropout. So does it still make sense to use have both dropout and batchnorm in those models at the same time? Is there a reason why dropout is not used anymore in recent architecture designs?


Dropouts in general are used in conjunction with fully connected layers. My understanding is that BatchNorm along with helping the model to converge faster, also acts as a regularizer(which serves the purpose).
It ‘could’ be derived from experimental evidence that dropouts may not be well suited for convolutions. That said, there is nothing stopping you in using batchnorm and dropout together.