Diagnosing bias / variance issues with dropout

In creating and running a few networks with Torch now, one issue I run into is that networks with Dropout tend to have test/validation accuracies higher than train.

I suppose this is normal, we are handicapping the network with a few layers that are unreliable, but this does make it harder to diagnose if problems in performance in training over time are due to 1) bias (underfitting), or 2) variance (overfitting).

Does anyone have recommendations for dealing with this?

One solution I might have would be after each epoch to not only evaluate the model (in model.eval() mode) the validation score, but also the training score, so that Dropout isn’t applied, but we see the “true” score on the training set.

Anyone have suggestions for such situations? Thanks!