# Expected behavior Dropout?

Hi,

I don’t know if I’m doing something wrong but I checked the tutorial https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#sphx-glr-intermediate-seq2seq-translation-tutorial-py to keep me up to date with the version 1.0 of PyTorch and did the same to my code.

I have remarked that during inference, Dropout still randomly puts zero because self.training=True even with torch.no_grad(). Is this expected ? If yes, how does the tutorial ? Before we were supposed to do something like model.eval(True)

Here a sample code

``````In [1]: import torch

In [2]: a = torch.Tensor([1,2,3,4,5])

In [3]: d = torch.nn.Dropout()

In [4]: d = torch.nn.Dropout(0.2)

In [5]: d(a)
Out[5]: tensor([0.0000, 0.0000, 0.0000, 0.0000, 6.2500])

In [6]: d(a)
Out[6]: tensor([1.2500, 2.5000, 3.7500, 0.0000, 6.2500])

In [7]: d(a)
Out[7]: tensor([1.2500, 0.0000, 3.7500, 5.0000, 6.2500])

...:     d(a)
...:

...:     print(d(a))
...:
...:
tensor([1.2500, 2.5000, 3.7500, 5.0000, 6.2500])

...:     print(d(a))
...:
...:
tensor([1.2500, 2.5000, 3.7500, 0.0000, 6.2500])

...:     print(d(a))
...:
...:
tensor([1.2500, 2.5000, 3.7500, 5.0000, 0.0000])

...:     print(d(a))
...:
...:
tensor([1.2500, 2.5000, 3.7500, 5.0000, 6.2500])

...:     print(d(a))
...:
...:
tensor([1.2500, 2.5000, 3.7500, 0.0000, 6.2500])

...:     print(d(a))
...:
...:
tensor([1.2500, 2.5000, 3.7500, 5.0000, 0.0000])

...:     print(d(a))
...:     print(d.training)
...:
...:
tensor([1.2500, 2.5000, 0.0000, 5.0000, 6.2500])
True

...:     print(d(a))
...:     print(d.training)
...:
...:
tensor([1.2500, 0.0000, 0.0000, 5.0000, 6.2500])
True

In [17]: print(d.training)
True
``````

Yeah this is expected, precisely due to that d.training is still True. And yes you should call model.eval() before testing. Could you tell me where in that tutorial suggests otherwise? I took a quick glance but didn’t see anything obvious.

Thank you for your quick answer ! I’m also checking the following thread ‘model.eval()’ vs ‘with torch.no_grad()’ where they say that the recommended way should be torch.no_grad() (and should be faster). At the end, It’s not very clear. What should we do ?

I didn’t read the tutorial but in the code, you can see at line 451 `self.dropout = nn.Dropout(self.dropout_p)` and at line 699 `with torch.no_grad():` and not an encoder.eval() or decoder.eval(). I could also find similarities with other tutorials.