When and where should I use the these layers in my CNN model? How much should be the dropout rate?
I think there is no single best answer to it and you could refer to some “standard” models, e.g. ResNet-like models, or any newer architectures. While the nonlinearity was often applied directly after the conv layers, you will also see some models, where it’s applied after the batchnorm layer.
The drop rate can be treated as a hyperparameter and you chould use the validation loss to tune it.
Thank you so much