A question about achieving complex weight decay in PyTorch

alan_ayu · December 27, 2019, 7:51am

Hello everyone,

I am reading the paper “Bag of tricks for image classification with convolutional neural networks”. And I want to test the impact of “No bias decay”, which means only applying regularization to weights of convolution, and other parameters, including the biases and the parameters in BN layers are left unregularized.

I checked the code of convolution in PyTorch, but I found may there is no way to apply different regularization to conv’s weights and bias, respectively.

The code I found about the definition of Convolution,
class _ConvNd(Module):

__constants__ = ['stride', 'padding', 'dilation', 'groups', 'bias',
                 'padding_mode', 'output_padding', 'in_channels',
                 'out_channels', 'kernel_size']

def __init__(self, in_channels, out_channels, kernel_size, stride,
             padding, dilation, transposed, output_padding,
             groups, bias, padding_mode):
    super(_ConvNd, self).__init__()
    if in_channels % groups != 0:
        raise ValueError('in_channels must be divisible by groups')
    if out_channels % groups != 0:
        raise ValueError('out_channels must be divisible by groups')
    self.in_channels = in_channels
    self.out_channels = out_channels
    self.kernel_size = kernel_size
    self.stride = stride
    self.padding = padding
    self.dilation = dilation
    self.transposed = transposed
    self.output_padding = output_padding
    self.groups = groups
    self.padding_mode = padding_mode
    if transposed:
        self.weight = Parameter(torch.Tensor(
            in_channels, out_channels // groups, *kernel_size))
    else:
        self.weight = Parameter(torch.Tensor(
            out_channels, in_channels // groups, *kernel_size))
    if bias:
        self.bias = Parameter(torch.Tensor(out_channels))
    else:
        self.register_parameter('bias', None)
    self.reset_parameters()

If you have any idea about that, please shear something in this post. Thank you !

Eta_C · December 27, 2019, 8:22am

In my understanding, this code would work. I have not test it, but it is something worth trying.

optim.SGD([
        {'params': other_parameters}
        {'params': model.conv_layer.weight, 'lr': 1e-3, 'weight_decay': 0.9},
        {'params': model.conv_layer.bias, 'lr': 1e-3, 'weight_decay': 0.0}
    ], lr=1e-2, momentum=0.9, weight_decay=0.98)