Hello,

I am trying to investigate some irregular behaviors in my trained network. The encoder is a standard pytorch Transformer encoder. I have identified that, when adding the `@torch.no_grad()`

decorator to a given function, the encoder outputs will change each time. For example:

```
def forward_1(self, x):
return self.encoder(x)
@torch.no_grad()
def forward_2(self, x):
return self.encoder(x)
```

If I set the model to `eval()`

and run both functions above with an identical input, I will get to different outputs.

If I am not mistaken, with `model.eval()`

all differentiating factors like dropout should be removed, and gradient computation itself should not be in any way affecting the outputs. In other words, **shouldnâ€™t the two functions above return the exact same Tensor?**

Many thanks in advance.