 # Using nn.Dropout2d at eval() time (for modelling uncertainty)

Hello,

Backstory:

I’ve taken some inspiration from this post on the fast.ai forums:

to build in dropout at evaluation time as a way of attempting to measure the uncertainty of a prediction.

I also used this post as a basis for `.apply()`-ing a function at `.eval()` time:

The way I understand these techniques:

By applying dropout at evaluation time and running over many forward passes (10-100+), you get predictions from a variety of different models.

What you can then do with these predictions is measure how much they differ (get the `.var()` of your 100 different samples).

With this difference, you can then see what samples the model is ‘uncertain’ about (the ones with high variance).

Use case example:

Uber seems to be using this technique for some of their predictions: https://eng.uber.com/tag/monte-carlo-dropout/

My main question (more of a sound check…):

I’ve put together an example pipeline using MNIST but I’m unsure of some of the custom functions I’ve created/taken from code examples online.

Has anyone had experience with Monte Carlo Dropout or another method of measuring uncertainty they can share?

My code (critiques/advice welcome):

``````# Create function to apply to model at eval() time
def apply_dropout(m):
if type(m) == nn.Dropout2d:
m.train()

# Func to predict MNIST class
def predict_class(model, X):
model = model.eval()
model.apply(apply_dropout) # apply dropout at pred time (see func above)
outputs = model(Variable(X))
#print(outputs)
_, pred = torch.max(outputs.data, 1)
return pred.numpy()

# Run for T times and get list_of_preds for measuring variance
def predict(model, X, T=100):
list_of_preds = []
standard_pred=predict_class(model, X)
y1 = []
y2 = []
for _ in range(T):
_y1 = model(Variable(X))
_y2 = F.softmax(_y1, dim=1)
y1.append(_y1.data.numpy())
y2.append(_y2.data.numpy())
list_of_preds.append(predict_class(model, X)) # predict T times
return standard_pred, np.array(y1), np.array(y2), np.array(list_of_preds)
``````

Your code looks fine to me. I tried a similar implementation on a different data set. I did see some variation in the prediction for in-class samples, but virtually nothing in the prediction for out-of-class samples. My network also used batch norm. However, when I put the whole `model` in the training mode, I observed significant variations in the predictions for both in-class samples and out-of-class samples, but the prediction accuracy for in-class samples were very low. What was your observation from your experiments – even if just on in-class samples will be interesting to me?