Using nn.Dropout2d at eval() time (for modelling uncertainty)

Hello,

Backstory:

I’ve taken some inspiration from this post on the fast.ai forums:

to build in dropout at evaluation time as a way of attempting to measure the uncertainty of a prediction.

I also used this post as a basis for .apply()-ing a function at .eval() time:

The way I understand these techniques:

By applying dropout at evaluation time and running over many forward passes (10-100+), you get predictions from a variety of different models.

What you can then do with these predictions is measure how much they differ (get the .var() of your 100 different samples).

With this difference, you can then see what samples the model is ‘uncertain’ about (the ones with high variance).

Use case example:

Uber seems to be using this technique for some of their predictions: https://eng.uber.com/tag/monte-carlo-dropout/

My main question (more of a sound check…):

I’ve put together an example pipeline using MNIST but I’m unsure of some of the custom functions I’ve created/taken from code examples online.

Has anyone had experience with Monte Carlo Dropout or another method of measuring uncertainty they can share?

My code (critiques/advice welcome):

# Create function to apply to model at eval() time
def apply_dropout(m):
    if type(m) == nn.Dropout2d:
        m.train()

# Func to predict MNIST class
def predict_class(model, X):
    model = model.eval()
    model.apply(apply_dropout) # apply dropout at pred time (see func above)
    outputs = model(Variable(X))
    #print(outputs)
    _, pred = torch.max(outputs.data, 1)
    return pred.numpy()

# Run for T times and get list_of_preds for measuring variance
def predict(model, X, T=100):
    list_of_preds = []
    standard_pred=predict_class(model, X)
    y1 = []
    y2 = []
    for _ in range(T):
        _y1 = model(Variable(X))
        _y2 = F.softmax(_y1, dim=1)
        y1.append(_y1.data.numpy())
        y2.append(_y2.data.numpy())
        list_of_preds.append(predict_class(model, X)) # predict T times
    return standard_pred, np.array(y1), np.array(y2), np.array(list_of_preds)

Your code looks fine to me. I tried a similar implementation on a different data set. I did see some variation in the prediction for in-class samples, but virtually nothing in the prediction for out-of-class samples. My network also used batch norm. However, when I put the whole model in the training mode, I observed significant variations in the predictions for both in-class samples and out-of-class samples, but the prediction accuracy for in-class samples were very low. What was your observation from your experiments – even if just on in-class samples will be interesting to me?