Measuring uncertainty using MC Dropout

Ka_Hina · August 5, 2020, 6:33pm

I am trying to implement Bayesian CNN using Mc Dropout on Pytorch,
the main idea is that by applying dropout at test time and running over many forward passes , you get predictions from a variety of different models.
I’ve found an application of the Mc Dropout and I really did not get how they applied this method and how exactly they did choose the correct prediction from the list of predictions

here is the code

def mcdropout_test(model):
model.train()
test_loss = 0
correct = 0
T = 100
for data, target in test_loader:
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data, volatile=True), Variable(target)
output_list =
for i in xrange(T):
output_list.append(torch.unsqueeze(model(data), 0))
output_mean = torch.cat(output_list, 0).mean(0)
test_loss += F.nll_loss(F.log_softmax(output_mean,dim=1), labels, reduction=‘sum’).data # sum up batch loss
pred = output_mean.data.max(1, keepdim=True)[1] # get the index of the max log-probability
correct += pred.eq(target.data.view_as(pred)).cpu().sum()
test_loss /= len(test_loader.dataset)
print('\nMC Dropout Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.2f}%)\n'.format(
    test_loss, correct, len(test_loader.dataset),
    100. * correct / len(test_loader.dataset)))

train()
mcdropout_test()

I have replaced

data, target = Variable(data, volatile=True), Variable(target)

by adding

with torch.no_grad() at the beginning

:
And this is how I have defined my CNN

class Net(nn.Module):
def init(self):
super(Net, self).init()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
self.dropout = nn.Dropout(p=0.3)

    nn.init.xavier_uniform_(self.conv1.weight)
    nn.init.constant_(self.conv1.bias, 0.0)
    nn.init.xavier_uniform_(self.conv2.weight)
    nn.init.constant_(self.conv2.bias, 0.0)
    nn.init.xavier_uniform_(self.fc1.weight)
    nn.init.constant_(self.fc1.bias, 0.0)
    nn.init.xavier_uniform_(self.fc2.weight)
    nn.init.constant_(self.fc2.bias, 0.0)
    nn.init.xavier_uniform_(self.fc3.weight)
    nn.init.constant_(self.fc3.bias, 0.0)


def forward(self, x):
    x = self.pool(F.relu(self.dropout(self.conv1(x))))  # recommended to add the relu
    x = self.pool(F.relu(self.dropout(self.conv2(x))))  # recommended to add the relu
    x = x.view(-1, 16 * 5 * 5)
    x = F.relu(self.fc1(x))
    x = F.relu(self.fc2(self.dropout(x)))
    x = self.fc3(self.dropout(x)) 
    return x

Can anyone help me to get the right implementation of the Monte Carlo Dropout method on CNN?

ptrblck · August 8, 2020, 10:28am

After 100 iterations, the authors are calculating the mean prediction in:

output_mean = torch.cat(output_list, 0).mean(0)

and use this tensor to calculate the test_loss as well as the predictions.

PS: You can post code snippets by wrapping the complete code block into three backticks ```, which makes debugging a bit easier.

Ka_Hina · August 9, 2020, 10:19pm

thank you so much for taking time debugging the code. I would like to know if this process is applied for each image apart or by taking a batch of images ?

ptrblck · August 10, 2020, 4:29am

Based on the current code, I assume that the input batch will be used T times.
However, I don’t know, where the mean is calculated, as the code is not formatted properly (i.e. where are output_mean and the following operations used).

Ka_Hina · August 11, 2020, 11:11am

Actually, the mean is calculated after each T iterations

for i in xrange(T):
    output_list.append(torch.unsqueeze(model(data), 0))
output_mean = torch.cat(output_list, 0).mean(0)

but by doing this for each image apart it really takes to much to run, so i was wondering if i could calculate the mean by batches ,will that make it run faster?

ptrblck · August 11, 2020, 10:53pm

Yes, you could use a batched input, if that’s not already the case for data.
If test_loader uses a batch size with is larger than 1, your code would already use batches instead of single samples.

Ka_Hina · August 12, 2020, 6:54pm

I’m sorry for asking so many questions, but I’m a bit confused about this method. I would like to know why they are calculating the mean in:

output_mean = torch.cat(output_list, 0).mean(0)

for just the first element of the output_list and considering it as the correct prediction from all the T tensors found in the output_list

ptrblck · August 13, 2020, 3:40am

The mean is calculated in dim0, not the first element.
Here is a code snippet to show, how this operation works:

batch_size = 2
nb_classes = 3

# initialize empty list
output_list = []

# append predictions to list
for _ in range(10):
    # Here you would call the model and create the outputs
    output_list.append(torch.randn(batch_size, nb_classes))

# Concatenate in dim0 and create tensor
output_list = torch.cat(output_list, 0)
print(output_list.shape) # [20, 3] = [batch_size * 10, nb_classes]

# Calculate mean in dim0 (over all samples)
output_mean = output_list.mean(0)
print(output_mean.shape) # [3]

Ka_Hina · August 13, 2020, 9:43am

Now it’s clear ! thank you so much .