AlphaDropout wrong behavior during evaluation

Hello there,

I have started PyTorch yesterday, so bear with me!

Anyways, I implemented a simple feedforward neural network using the SELU activation function. With such an activation function, the traditional Dropout method cannot be used. The AlphaDropout shall be used in its stead and is already implemented as a torch.nn module.

However, it seems that the alphadropout is not deactivated at evaluation time (as a matter of fact, it seems that the problem happens in Keras/Tensorflow as well). Let’s take an example:

class FFNN(nn.Module):

    def __init__(self):
        super().__init__()

        self.linear1 = nn.Linear(300, 512)
        self.linear2 = nn.Linear(512, 1)
        self.dropout = nn.AlphaDropout(0.5)
        self.act = nn.SELU()

    def forward(self, xb):
        xb = self.dropout(self.act(self.linear1(xb)))
        xb = self.linear2(xb)

        return xb

def fit(epochs, bs, model, loss_func, opt, train_ds, valid_ds):
    train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
    for epoch in range(epochs):
        model.train()
        train_losses, train_nums = zip(*[loss_batch(model, loss_func(), xb, yb, opt) for xb, yb in train_dl])

        model.eval()
        with torch.no_grad():
            train_losses2, train_nums2 = zip(*[loss_batch(model, loss_func(), xb, yb) for xb, yb in train_dl])
        train_loss = np.sum(np.multiply(train_losses, train_nums)) / np.sum(train_nums)
        train_loss2 = np.sum(np.multiply(train_losses2, train_nums2)) / np.sum(train_nums2)

        print(epoch, train_loss, train_loss2)

With this code, let’s conduct a few experiments with:

  1. SELU and AlphaDropout(0.0)
0 0.36914605454162314 0.19611775599144124
1 0.16745460151679933 0.1478944589142446
2 0.15180683443470608 0.13595040272625666
3 0.14030311662684042 0.1300328708988018
4 0.1343339836944348 0.12687919099652578
5 0.1336454855938437 0.13134129443969675
  1. SELU and AlphaDropout(0.5)
0 0.5045919609132898 0.7223648498297999
1 0.3135877888669413 0.9336661428370804
2 0.2849435528750142 1.2059029422739826
3 0.2717001248131353 1.3849857490529458
4 0.2621053279076935 1.4490251131158658
5 0.2594372811456206 1.2010613875414329
  1. RELU and Dropout(0.0)
0 0.2965477593834438 0.18009340502913035
1 0.1717412172959595 0.15529460897521366
2 0.153808161379799 0.14437383949441254
3 0.14423141939930184 0.13402091928575405
4 0.13592245311491072 0.1272785943494272
5 0.12786124114479339 0.1251134395441681
  1. RELU and Dropout(0.5)
0 0.3608384240398962 0.20271951851075287
1 0.2268548979172631 0.17966684643869046
2 0.2109495293368738 0.17081280320725112
3 0.2035760818019746 0.16349141307608792
4 0.2011605006046396 0.17637636540112672
5 0.1890329434758141 0.15228609957549938

As you can see from the logs, when AlphaDropout is activated (value of 0.5), validation loss is significantly worse than training loss, while being on the same data! In the other hand, Dropout and ReLU activation functions seem to work as intented.

Do you think it is possible that the AlphaDropout isn’t deactivated during evaluation? Or is my understanding wrong?

doesn’t the fact that there is a gap suggest that alpha dropout is disabled in eval?

Indeed, disabled but without scaling the values properly?

They will indeed be disabled in .eval() mode. SELU and AlphaDropout work well if inputs are normalized. Try normalizing inputs.

The inputs are already standardized (zero-mean and unit-variance), this is not the problem here.

Typically, Alpha Dropout b/w 0.05 and 0.1 lead to good performance. Alpha dropout of 0.5 might not lead to similar performance as Standard Dropout of 0.5. Try reducing the value of Alpha Dropout

1 Like

Yeah, I realised that as well. It seems to work better, with lower values of dropout. Thanks!