AlphaDropout wrong behavior during evaluation

Xema · March 15, 2019, 1:02pm

Hello there,

I have started PyTorch yesterday, so bear with me!

Anyways, I implemented a simple feedforward neural network using the SELU activation function. With such an activation function, the traditional Dropout method cannot be used. The AlphaDropout shall be used in its stead and is already implemented as a torch.nn module.

However, it seems that the alphadropout is not deactivated at evaluation time (as a matter of fact, it seems that the problem happens in Keras/Tensorflow as well). Let’s take an example:

class FFNN(nn.Module):

    def __init__(self):
        super().__init__()

        self.linear1 = nn.Linear(300, 512)
        self.linear2 = nn.Linear(512, 1)
        self.dropout = nn.AlphaDropout(0.5)
        self.act = nn.SELU()

    def forward(self, xb):
        xb = self.dropout(self.act(self.linear1(xb)))
        xb = self.linear2(xb)

        return xb

def fit(epochs, bs, model, loss_func, opt, train_ds, valid_ds):
    train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
    for epoch in range(epochs):
        model.train()
        train_losses, train_nums = zip(*[loss_batch(model, loss_func(), xb, yb, opt) for xb, yb in train_dl])

        model.eval()
        with torch.no_grad():
            train_losses2, train_nums2 = zip(*[loss_batch(model, loss_func(), xb, yb) for xb, yb in train_dl])
        train_loss = np.sum(np.multiply(train_losses, train_nums)) / np.sum(train_nums)
        train_loss2 = np.sum(np.multiply(train_losses2, train_nums2)) / np.sum(train_nums2)

        print(epoch, train_loss, train_loss2)

With this code, let’s conduct a few experiments with:

SELU and AlphaDropout(0.0)

0 0.36914605454162314 0.19611775599144124
1 0.16745460151679933 0.1478944589142446
2 0.15180683443470608 0.13595040272625666
3 0.14030311662684042 0.1300328708988018
4 0.1343339836944348 0.12687919099652578
5 0.1336454855938437 0.13134129443969675

SELU and AlphaDropout(0.5)

0 0.5045919609132898 0.7223648498297999
1 0.3135877888669413 0.9336661428370804
2 0.2849435528750142 1.2059029422739826
3 0.2717001248131353 1.3849857490529458
4 0.2621053279076935 1.4490251131158658
5 0.2594372811456206 1.2010613875414329

RELU and Dropout(0.0)

0 0.2965477593834438 0.18009340502913035
1 0.1717412172959595 0.15529460897521366
2 0.153808161379799 0.14437383949441254
3 0.14423141939930184 0.13402091928575405
4 0.13592245311491072 0.1272785943494272
5 0.12786124114479339 0.1251134395441681

RELU and Dropout(0.5)

0 0.3608384240398962 0.20271951851075287
1 0.2268548979172631 0.17966684643869046
2 0.2109495293368738 0.17081280320725112
3 0.2035760818019746 0.16349141307608792
4 0.2011605006046396 0.17637636540112672
5 0.1890329434758141 0.15228609957549938

As you can see from the logs, when AlphaDropout is activated (value of 0.5), validation loss is significantly worse than training loss, while being on the same data! In the other hand, Dropout and ReLU activation functions seem to work as intented.

Do you think it is possible that the AlphaDropout isn’t deactivated during evaluation? Or is my understanding wrong?

SimonW · March 15, 2019, 8:16pm

doesn’t the fact that there is a gap suggest that alpha dropout is disabled in eval?

Xema · March 16, 2019, 6:33pm

Indeed, disabled but without scaling the values properly?

avinash_m · March 17, 2019, 12:21pm

They will indeed be disabled in .eval() mode. SELU and AlphaDropout work well if inputs are normalized. Try normalizing inputs.

Xema · March 17, 2019, 11:24pm

The inputs are already standardized (zero-mean and unit-variance), this is not the problem here.

avinash_m · March 18, 2019, 11:59am

Typically, Alpha Dropout b/w 0.05 and 0.1 lead to good performance. Alpha dropout of 0.5 might not lead to similar performance as Standard Dropout of 0.5. Try reducing the value of Alpha Dropout

Xema · March 19, 2019, 11:33am

Yeah, I realised that as well. It seems to work better, with lower values of dropout. Thanks!