I have no doubt this works quite fine. I only wonder if I sacrifice any noteworthy performance if my dropout probability is 0.0, making all Dropout layers essentially identity functions. In principle, I could do something like
if self.dropout_prob is not None and self.dropout_prob > 0.0:
out = self.dropout1(out)
Would this have any measurable advantage in practice?