Bias breaks learning?

I have a really basic conv net training on MNIST. just conv2d->relu->conv2d->relu->CrossEntropyLoss. When I initialize the Conv2d with bias=True it will train over many epochs through the dataset while remaining at chance accuracy (10%). However when bias=False it achieves 90% accuracy after 5 epochs. Can anyone explain why this might be happening? If it matters the last conv2d is really a linear layer, kernel size is same as input size.

That sounds weird. Do you mind posting your script?

Don’t mind. Not sure how to correctly format posting code here, hope this is right. I started with the pytorch wide-resnet code since I will be doing stuff later that requires putting it back that way. https://github.com/meliketoy/wide-resnet.pytorch. Flipping bias from True to False makes it work/not work repeatably.

modified networks/wide_resnet.py:

import torch
import torch.nn as nn
import torch.nn.init as init
import torch.nn.functional as F
from torch.autograd import Variable
import math
import sys
import numpy as np
import pdb
def conv_init_pr(m):
    for m in m.modules():
        if isinstance(m, nn.Conv2d):
            n = m.kernel_size[0]*m.kernel_size[1]*m.out_channels
            m.weight.data.normal_(0, math.sqrt(2./n))
class wide_basic(nn.Module):
    def __init__(self, in_planes, planes, dropout_rate, stride=1, kernel_size=3):
        super(wide_basic, self).__init__()
        self.conv2 = nn.Conv2d(in_planes, planes, kernel_size=kernel_size, stride=stride, padding=1, bias=True)
    def forward(self, x):
        out = self.conv2(F.relu(x))
        return out
class Wide_ResNet(nn.Module):
    def __init__(self, depth, widen_factor, dropout_rate, num_classes):
        super(Wide_ResNet, self).__init__()
        self.in_planes = 3
        n = 1
        k = 1
        self.layer1 = self._wide_layer(wide_basic, 16, n, dropout_rate, stride=2, kernel_size=3)
        self.layer2 = self._wide_layer(wide_basic, 10, n, dropout_rate, stride=2, kernel_size=16)
    def _wide_layer(self, block, planes, num_blocks, dropout_rate, stride, kernel_size):
        strides = [stride] + [1]*(num_blocks-1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_planes, planes, dropout_rate, stride, kernel_size))
            self.in_planes = planes
        return nn.Sequential(*layers)
    def forward(self, x):
        out2 = self.layer1(x)
        out3 = self.layer2(out2)
        out4 = F.relu(out3)
        return out4.squeeze()
if __name__ == '__main__':
    net=Wide_ResNet(28, 10, 0.3, 10)
    y = net(Variable(torch.randn(1,3,32,32)))

    print(y.size())

...
1 Like

I’m not spotting anything weird… Could you try different optimizer and lr combinations?

Tried with SGD, Adam, and Adadelta. Learning rate 1-0.001 every e10. Same every time, learns without bias, doesn’t learn with bias.

Have also tried now with the last layer being a linear layer and once again does not work with bias, works with bias=False. Also tried manually adding bias with the following setting bias=False for layers and does not work.

self.values = nn.Parameter(torch.Tensor(planes).zero_().cuda(), requires_grad=True)
self.register_parameter('values', self.values)

Edit: Actually it does with with the output linear layer having a bias, but the hidden convolutional layer bias will break it.

Have you tried to initialize the bias with zeros?
Add the following line to conv_init_pr(m) and try to run it again:

m.bias.data.zero_()

No luck with that either. Did confirm that bias starts out at 0’s when added.

Alright, thanks to both of you for the replys! and Zeroing the bias does help now that it’s working!

Problem was that my MNIST data is 0-1 and PIL Image was expecting 0-255 so all the data was small. Multiplying input data by 255 makes it work. I still do not understand why this specific problem would work/not work because of bias, seems like if small data was a problem it would also not work without the bias term. But at least it’s working, so thanks again!

2 Likes