Is there a better way to generate monotone positive trainable weights?

Hi , I was trying to generate a group of trainable monotone positive tensors as the weights for Conv1d. It actually works to some extent, but the problem is the real trainable parameter “alpha”/“beta” changed very slightly compared with orthodox convolution weights. They just fluctuated around the initiate value. So I suspect the grads are not very friendly to train due to the way I create those weights. Just curious if there is better way to handle it. Thanks!

class Fir1d(nn.Module):
    def __init__(self, k_size, device, init_alpha, init_beta=0):
        super().__init__()
        # every channel uses same kernel weights
        assert k_size % 2 == 1
        m = (k_size-1)//2
        self.leftpad = nn.ReplicationPad1d((k_size-1, 0))
        self.lin = torch.linspace(-m, m, k_size, device=device, requires_grad=False)
        self.alpha = nn.Parameter(torch.tensor([init_alpha], device=device, dtype=torch.float32, requires_grad=True))
        self.beta = nn.Parameter(torch.tensor([init_beta], device=device, dtype=torch.float32, requires_grad=True))
        self.w = torch.softmax(self.lin * self.alpha + self.beta, dim=0).unsqueeze(0).unsqueeze(0)  # 1,1,k


    def forward(self, x):
        # b,c,l
        xlist = []
        x = self.leftpad(x)
        B,C,L = x.shape
        for i in range(C):
            subx = x[:, [i], :]
            main_trend = F.conv1d(subx, self.w)
            xlist.append(main_trend)
        xlist = torch.cat(xlist, dim=1)
        return xlist

Hi Ximeng!

Am I correct that you want the individual weight-values in your conv1d()
kernel to be positive and monotonically increasing, even as they train?

This won’t work.

You are only calling:

self.w = torch.softmax(self.lin * self.alpha + self.beta, dim=0).unsqueeze(0).unsqueeze(0)

once, when you initialize your Fir1d model. Even though you update alpha
and beta when you run your optimize step, you never recompute w (your
kernel weights), so changing the values of alpha and beta doesn’t actually
do anything.

Recompute w inside of forward() (and just have it be a local variable of
Fir1d's forward() method), e.g.:

        ...
        w = torch.softmax(self.lin * self.alpha + self.beta, dim=0).unsqueeze(0).unsqueeze(0)
        for i in range(C):
            subx = x[:, [i], :]
            main_trend = F.conv1d (subx, w)
            xlist.append(main_trend)
        xlist = torch.cat(xlist, dim=1)
        return xlist

or, probably more efficiently, without the loop:

    def forward(self, x):
        w = torch.softmax (self.lin * self.alpha, dim=0).unsqueeze(0).expand (C, 1, self.lin.size (0))
        return F.conv1d (x, w, groups = C)

I left self.beta out of the loop-free version because it doesn’t do anything.
softmax() takes “raw-score” logits that it then, in effect, “normalizes,” causing
self.beta to drop out of the softmax() computation.

So (even if you leave beta in) you will be training only one kernel-weight
parameter, alpha.

As an aside, if you wanted your kernel weights to depend on more trainable
parameters, while still being positive and increasing monotonically, you could:

    def __init__(self, k_size, device, init_alpha, init_beta=0):
        ...
        self.kernel_parameters = nn.Parameter (torch.zeros (k_size))   # initial value

    def forward(self, x):
        w = self.kernel_parameters.exp().cumsum (0).unsqueeze (0).expand (C, 1, kernel_parameters.size (0))
        return F.conv1d (x, w, groups = C)

Your raw kernel_parameters run from -inf to inf. .exp() will cause your
derived weights, w, to be positive, and .cumsum() will cause your derived
weights to be monotonically increasing.

Best.

K. Frank

Thank you Frank! You are so professional. Every time you helped me a lot XD.