Hello there,
I have started PyTorch yesterday, so bear with me!
Anyways, I implemented a simple feedforward neural network using the SELU activation function. With such an activation function, the traditional Dropout method cannot be used. The AlphaDropout shall be used in its stead and is already implemented as a torch.nn module.
However, it seems that the alphadropout is not deactivated at evaluation time (as a matter of fact, it seems that the problem happens in Keras/Tensorflow as well). Let’s take an example:
class FFNN(nn.Module):
def __init__(self):
super().__init__()
self.linear1 = nn.Linear(300, 512)
self.linear2 = nn.Linear(512, 1)
self.dropout = nn.AlphaDropout(0.5)
self.act = nn.SELU()
def forward(self, xb):
xb = self.dropout(self.act(self.linear1(xb)))
xb = self.linear2(xb)
return xb
def fit(epochs, bs, model, loss_func, opt, train_ds, valid_ds):
train_dl, valid_dl = get_data(train_ds, valid_ds, bs)
for epoch in range(epochs):
model.train()
train_losses, train_nums = zip(*[loss_batch(model, loss_func(), xb, yb, opt) for xb, yb in train_dl])
model.eval()
with torch.no_grad():
train_losses2, train_nums2 = zip(*[loss_batch(model, loss_func(), xb, yb) for xb, yb in train_dl])
train_loss = np.sum(np.multiply(train_losses, train_nums)) / np.sum(train_nums)
train_loss2 = np.sum(np.multiply(train_losses2, train_nums2)) / np.sum(train_nums2)
print(epoch, train_loss, train_loss2)
With this code, let’s conduct a few experiments with:
- SELU and AlphaDropout(0.0)
0 0.36914605454162314 0.19611775599144124
1 0.16745460151679933 0.1478944589142446
2 0.15180683443470608 0.13595040272625666
3 0.14030311662684042 0.1300328708988018
4 0.1343339836944348 0.12687919099652578
5 0.1336454855938437 0.13134129443969675
- SELU and AlphaDropout(0.5)
0 0.5045919609132898 0.7223648498297999
1 0.3135877888669413 0.9336661428370804
2 0.2849435528750142 1.2059029422739826
3 0.2717001248131353 1.3849857490529458
4 0.2621053279076935 1.4490251131158658
5 0.2594372811456206 1.2010613875414329
- RELU and Dropout(0.0)
0 0.2965477593834438 0.18009340502913035
1 0.1717412172959595 0.15529460897521366
2 0.153808161379799 0.14437383949441254
3 0.14423141939930184 0.13402091928575405
4 0.13592245311491072 0.1272785943494272
5 0.12786124114479339 0.1251134395441681
- RELU and Dropout(0.5)
0 0.3608384240398962 0.20271951851075287
1 0.2268548979172631 0.17966684643869046
2 0.2109495293368738 0.17081280320725112
3 0.2035760818019746 0.16349141307608792
4 0.2011605006046396 0.17637636540112672
5 0.1890329434758141 0.15228609957549938
As you can see from the logs, when AlphaDropout is activated (value of 0.5), validation loss is significantly worse than training loss, while being on the same data! In the other hand, Dropout and ReLU activation functions seem to work as intented.
Do you think it is possible that the AlphaDropout isn’t deactivated during evaluation? Or is my understanding wrong?