Pytorch net from: Striving for Simplicity: The All Convolutional Net

Is there an implementation of:

in pytorch?

I’ve used this, and it works fine on Cifar10

class AllConvNet(nn.Module):

    def __init__(self, dropout=True, nc=3, num_classes=10):
        super(AllConvNet, self).__init__()
        self.dropout = dropout
        self.conv1 = nn.Conv2d(nc, 96, 3, padding=1)
        self.conv2 = nn.Conv2d(96, 96, 3, padding=1)
        self.conv3 = nn.Conv2d(96, 96, 3, padding=1, stride=2)
        self.conv4 = nn.Conv2d(96, 192, 3, padding=1)
        self.conv5 = nn.Conv2d(192, 192, 3, padding=1)
        self.conv6 = nn.Conv2d(192, 192, 3, padding=1, stride=2)
        self.conv7 = nn.Conv2d(192, 192, 3, padding=1)
        self.conv8 = nn.Conv2d(192, 192, 1)
        self.class_conv = nn.Conv2d(192, num_classes, 1)

    def forward(self, x):
        if self.dropout:
            x = F.dropout(x, .2)
        conv1_out = F.relu(self.conv1(x))
        conv2_out = F.relu(self.conv2(conv1_out))
        conv3_out = F.relu(self.conv3(conv2_out))
        if self.dropout:
            conv3_out = F.dropout(conv3_out, .5)
        conv4_out = F.relu(self.conv4(conv3_out))
        conv5_out = F.relu(self.conv5(conv4_out))
        conv6_out = F.relu(self.conv6(conv5_out))
        if self.dropout:
            conv6_out = F.dropout(conv6_out, .5)
        conv7_out = F.relu(self.conv7(conv6_out))
        conv8_out = F.relu(self.conv8(conv7_out))

        class_out = F.relu(self.class_conv(conv8_out))
        pool_out = class_out.reshape(class_out.size(0), class_out.size(1), -1).mean(-1)
        return pool_out

it was written based on

1 Like

Beautiful Simon, Thanks you so much for sharing! :smiley:

Did you get like 95% accuracy on Cifar10? Do u remember ur accuracies/errors?

Do you remember how long it took to run btw?

I used it in some sort of an unconventional way, without dropout, trained for very few epochs w/o lr decay, etc. But I remember getting 90% even with these conditions. I didn’t track the time, but it didn’t take too long iirc.

1 Like

How would you suggest to use BN on this? After every single layer? Seems like a lot of batch norm…

interesting, I’m not sure what Im doing wrong, but I don’t get it to go too far away from chance…I think the only thing Im not the same as the hyperparams of from the website you shared is the batch size being size 32, Im using larger 2**10 I think but that shouldn’t matter too much…right?

I’m not sure how to apply BN on this, but it may not be necessary as it is not too deep.

Did you make sure that you normalize the image correctly? Maybe tune the lr?

1 Like

Ok, I will try a few more learning rates why not…

I am using the preprocessing from the standard pytorch cifar10 tutorial…the paper/that code is not using that, its using instead:

train_mean = trainset.train_data.mean(axis=(0,1,2))/255  # [0.49139968  0.48215841  0.44653091]
train_std = trainset.train_data.std(axis=(0,1,2))/255  # [0.24703223  0.24348513  0.26158784]
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.Normalize(train_mean, train_std),

transform_test = transforms.Compose([
    transforms.Normalize(train_mean, train_std),

while I am using (from the cifar 10 tutorial):

    transform = []
    ''' converts (HxWxC) in range [0,255] to [0.0,1.0] '''
    to_tensor = transforms.ToTensor()
    ''' Given meeans (M1,...,Mn) and std: (S1,..,Sn) for n channels, input[channel] = (input[channel] - mean[channel]) / std[channel] '''
    if standardize:
        gaussian_normalize = transforms.Normalize( (0.5, 0.5, 0.5), (0.5, 0.5, 0.5) )
    ''' transform them to Tensors of normalized range [-1, 1]. '''
    transform = transforms.Compose(transform)

I guess since its not the same data pre-processing maybe I need to play around with the lr and scheduler more on my own…I will try that! thanks for the ideas.

ok now I need to try their pre-processing but I find it really worrying that its so sensitive to everything! The pre-processing doesn’t even seem that different to me, 0.5 vs 0.2 etc ? bad signs for re-producibility… :confused:

Thanks for the help though.

I won’t say it is bad sign for reproducibility. Arch, normalization method, lr, optimization algorithm, weight init, they all matter. With a different set of normalization constants, you will definitely need to tune other things too to make it work. Unfortunately that is the current state of DL.

this is really weird, I think I’ve used all the hyperparams and preprocessing the link StefOe provided but I can’t get lower than 0.2 error :frowning: I wonder what I’m doing wrong…

Any updates? I am trying to train AllConvNet on CIFAR10, no success. It is strange.


I tried to train this network. However, the loss quickly converge to 2.3026, then the loss doesn’t descent. Any ideas on how to solve it?
When I tried to debug it, I found the output of the network is all zeros. The output is [batch_size, 10], which is all zeros.