ELUs Not Training

Anyone have any experience using ELUs for image classification?

I’m trying to follow the network described in the ELU paper, pg 7 but the performance on CIFAR100 is terrible. Training loss bottoms out to ~0.005 almost instantly and test accuracy is <= 10%. I must be doing something wrong…

Also, their description of the network doesn’t seem to work. With all of the maxpool layers that they describe, 32x32 images collapse too soon in the network. The code that I’m using is below. The commented out maxpools are ones that they describe, but that I removed. Where they have 0.0 dropout prob I omitted the layer.

class ELUnet(nn.Module):
    def __init__(self, inplace=True, bias=False):
        super(ELUnet, self).__init__()

        self.net = nn.Sequential(
            ###stack 1
            nn.Conv2d(3, 192, 5, bias=bias),
            nn.ELU(inplace=inplace),
            nn.MaxPool2d(2, 2),
            #no dropout
            ###stack 2
            nn.Conv2d(192, 192, 1, bias=bias),
            nn.Conv2d(192, 240, 3, bias=bias),
            nn.ELU(inplace=inplace),
            nn.MaxPool2d(2, 2),
            nn.Dropout2d(p=0.1),
            ###stack 3
            nn.Conv2d(240, 240, 1, bias=bias),
            nn.Conv2d(240, 260, 2, bias=bias),
            nn.ELU(inplace=inplace),
            #nn.MaxPool2d(2, 2),
            nn.Dropout2d(p=0.2),
            ###stack 4
            nn.Conv2d(260, 260, 1, bias=bias),
            nn.Conv2d(260, 280, 2, bias=bias),
            nn.ELU(inplace=inplace),
            #nn.MaxPool2d(2, 2),
            nn.Dropout2d(p=0.3),
            ###stack 5
            nn.Conv2d(280, 280, 1, bias=bias),
            nn.Conv2d(280, 300, 2, bias=bias),
            nn.ELU(inplace=inplace),
            #nn.MaxPool2d(2, 2),
            nn.Dropout2d(p=0.4),
            ###stack 6
            nn.Conv2d(300, 300, 1, bias=bias),
            nn.ELU(inplace=inplace),
            #nn.MaxPool2d(2, 2),
            nn.Dropout2d(p=0.5),
            ###stack 7
            nn.Conv2d(300, 100, 1, bias=bias),
            nn.ELU(inplace=inplace),
            nn.MaxPool2d(2, 2)
            ###no dropout
        )

    def forward(self, x, print_trigger=False):
        out = self.net(x)
        if print_trigger:
            print('{} pre output {}'.format(self.__class__.__name__, out.size()))
        return out.view(out.size(0), 100)

Using the same training routine (35 iterations with SmoothL1Loss and SGD with their settings) I got a resnet18 variant up to about 46pct test error.

Thanks for any insights.

ELU has a tunable parameter alpha. Can you try adjusting that?
http://pytorch.org/docs/nn.html#elu

Setting alpha=0.5 and alpha=2.0 seems to have no effect. Both tests I just ran are no better than chance on verification.