What is the correct place to apply dropout at Conv and Linear layers?

Hi,

I am a bit confused about where to exactly apply dropout in CNN network.

In the below model I applied dropout in both of the Conv layers and also in the linear layer.

But I am not sure whether I need to apply it. After ReLu? or before ReLu ? in linear layers.

And also I am not sure if I implemented dropout in correct place in Conv layers.

I am experimenting on dropout mc outputs of the CNN model : uncertainty metrics

I got different mean confidence values and uncertainty values, when I used dropout before or after the F.relu for fc1.

class CNN_dropout(nn.Module):
    def __init__(self):
        super(CNN_dropout, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)
        self.drop_layer = nn.Dropout(p=0.5)

    def last_hidden_layer_output(self, x):
        x = F.max_pool2d(F.relu(self.drop_layer(self.conv1(x))), 2)
        x = F.max_pool2d(F.relu(self.drop_layer(self.conv2(x))), 2)
        x = x.view(-1, 320)
        x = F.relu(self.drop_layer(self.fc1(x)))
        return x

    def forward(self, x):
        x = self.last_hidden_layer_output(x)
        x = self.fc2(x)
        return x

My experiment results differ when I switch

from

    def last_hidden_layer_output(self, x):
        x = F.max_pool2d(F.relu(self.drop_layer(self.conv1(x))), 2)
        x = F.max_pool2d(F.relu(self.drop_layer(self.conv2(x))), 2)
        x = x.view(-1, 320)
        x = F.relu(self.drop_layer(self.fc1(x)))
        return x

to

    def last_hidden_layer_output(self, x):
        x = F.max_pool2d(F.relu(self.drop_layer(self.conv1(x))), 2)
        x = F.max_pool2d(F.relu(self.drop_layer(self.conv2(x))), 2)
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = self.drop_layer(x)
        return x

If I am sure where exactly to apply dropout in linear layer, maybe I also need to change the place of dropout ın conversation layers too?

To me, these seem to be a better choice but I am not sure unfortunately.

x = F.relu(self.drop_layer(self.fc1(x)))
x = F.max_pool2d(F.relu(self.drop_layer(self.conv1(x))), 2)

I usually go for this:
x -> conv -> relu -> maxpool -> dropout -> linear

or is it like?

        x = self.drop_layer(F.max_pool2d(F.relu(self.conv1(x)), 2))

and

        x = self.drop_layer(F.relu(self.fc1(x)))