Expected behavior Dropout?

Hi,

I don’t know if I’m doing something wrong but I checked the tutorial https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#sphx-glr-intermediate-seq2seq-translation-tutorial-py to keep me up to date with the version 1.0 of PyTorch and did the same to my code.

I have remarked that during inference, Dropout still randomly puts zero because self.training=True even with torch.no_grad(). Is this expected ? If yes, how does the tutorial ? Before we were supposed to do something like model.eval(True)

Here a sample code

In [1]: import torch

In [2]: a = torch.Tensor([1,2,3,4,5])

In [3]: d = torch.nn.Dropout()

In [4]: d = torch.nn.Dropout(0.2)

In [5]: d(a)
Out[5]: tensor([0.0000, 0.0000, 0.0000, 0.0000, 6.2500])

In [6]: d(a)
Out[6]: tensor([1.2500, 2.5000, 3.7500, 0.0000, 6.2500])

In [7]: d(a)
Out[7]: tensor([1.2500, 0.0000, 3.7500, 5.0000, 6.2500])

In [8]: with torch.no_grad():
   ...:     d(a)
   ...:     

In [9]: with torch.no_grad():
   ...:     print(d(a))
   ...:     
   ...:     
tensor([1.2500, 2.5000, 3.7500, 5.0000, 6.2500])

In [10]: with torch.no_grad():
    ...:     print(d(a))
    ...:     
    ...:     
tensor([1.2500, 2.5000, 3.7500, 0.0000, 6.2500])

In [11]: with torch.no_grad():
    ...:     print(d(a))
    ...:     
    ...:     
tensor([1.2500, 2.5000, 3.7500, 5.0000, 0.0000])

In [12]: with torch.no_grad():
    ...:     print(d(a))
    ...:     
    ...:     
tensor([1.2500, 2.5000, 3.7500, 5.0000, 6.2500])

In [13]: with torch.no_grad():
    ...:     print(d(a))
    ...:     
    ...:     
tensor([1.2500, 2.5000, 3.7500, 0.0000, 6.2500])

In [14]: with torch.no_grad():
    ...:     print(d(a))
    ...:     
    ...:     
tensor([1.2500, 2.5000, 3.7500, 5.0000, 0.0000])

In [15]: with torch.no_grad():
    ...:     print(d(a))
    ...:     print(d.training)
    ...:     
    ...:     
tensor([1.2500, 2.5000, 0.0000, 5.0000, 6.2500])
True

In [16]: with torch.no_grad():
    ...:     print(d(a))
    ...:     print(d.training)
    ...:     
    ...:     
tensor([1.2500, 0.0000, 0.0000, 5.0000, 6.2500])
True

In [17]: print(d.training)
True

Yeah this is expected, precisely due to that d.training is still True. And yes you should call model.eval() before testing. Could you tell me where in that tutorial suggests otherwise? I took a quick glance but didn’t see anything obvious.

Thank you for your quick answer ! I’m also checking the following thread ‘model.eval()’ vs ‘with torch.no_grad()’ where they say that the recommended way should be torch.no_grad() (and should be faster). At the end, It’s not very clear. What should we do ?

I didn’t read the tutorial but in the code, you can see at line 451 self.dropout = nn.Dropout(self.dropout_p) and at line 699 with torch.no_grad(): and not an encoder.eval() or decoder.eval(). I could also find similarities with other tutorials.