Per-Channel Normalization Layer implementation

Gautam_Bhattacharya · May 17, 2017, 6:31pm

Hello,
I am pretty new to pytorch and I am trying to implement the trainable layer proposed in this paper : https://arxiv.org/pdf/1607.05666.pdf
Here is my code:

import torch.nn as nn
import torch
import numpy as np
from torch.autograd import Variable

class PCEN(nn.Module):
    def __init__(self,in_features,bias=False):
        super(PCEN,self).__init__()

    self.alpha = nn.Parameter(torch.Tensor(in_features,))
    self.delta = nn.Parameter(torch.Tensor(in_features,))
    self.r = nn.Parameter(torch.Tensor(in_features,))
    self.eps = torch.Tensor([0.00001])

def forward(self,x,smoother):
    alpha = self.alpha.expand_as(x)
    delta = self.delta.expand_as(x)
    r = self.r.expand_as(x)

    pcen = (x/(self.eps + smoother)**alpha + delta)**r - delta**r
    return pcen

#40 dimensional filterbank energies
pcen = PCEN(40)

#dummy data
feats = np.random.standard_normal(size=(10,40)).astype('float32')
smoother = np.random.standard_normal(size=(10,40)).astype('float32')

feats = Variable(torch.from_numpy(feats))
smoother = Variable(torch.from_numpy(smoother))

pcen_feats = pcen(feats,smoother)

Q. the eps parameter is to avoid division by zero and I don’t need to use expand_as with it?

Q. The forward pass seems to be working, I was wondering if there are any obvious errors? Do I need to use register buffers for the parameters ?

Q. In the paper they say to insure parameter positivity, they do gradient updates on the log values of the parameters and then take exponentials. How can I go about doing this?

I have a couple more questions, but I will save them for now.

Thanks,
Gautam

tom · May 17, 2017, 7:46pm

Hello @Gautam_Bhattacharya

that seems like a great project!

Yes, that usually is just the regularisation. I’d even leave it as a python float.

I think something is up with the indentation, but that is likely only the quoting, I have not checked in great detail.

You could use self.log_alpha, log_delta, log_r as the parameter (but ideally init to something close to in 0 instead of 1, too) and then do alpha = self.log_alpha.exp().expand_as(x).

I hope this helps.

Best regards

Thomas

Gautam_Bhattacharya · May 17, 2017, 8:03pm

Thanks for the reply @tom

yea, the indent for the forward function got messed up while I was pasting the code.

“You could use self.log_alpha, log_delta, log_r as the parameter (but ideally init to something close to in 0 instead of 1, too) and then do alpha = self.log_alpha.exp().expand_as(x).”

I am confused as to what you mean exactly. Lets say I init them properly. in the paper they initialize with a normal distribution with mean 1 and std 0.1.
Q. When exactly would I take the log?

I thought I could do something like - for a simple version of SGD, though it would be nice to use pytorchs optimizers

for p in pcen.parameters():
p_log = torch.log§
p_log.data.add_(-learning_rate, p.grad.data) #or p_log.grad.data?
# and then somehow copy the exponentiated log parameters back to p

Q. or is all this not necessary based on the approach you proposed?

Thanks,
Gautam

tom · May 17, 2017, 9:30pm

Hi,

apologies for being less clear.
I’d do something like the following (the probability of the log going wrong is not that large, given that the mean is 10 standard deviations from 0):

class PCEN(nn.Module):
    def __init__(self,in_features,bias=False):
        super(PCEN,self).__init__()

        self.log_alpha = nn.Parameter((torch.randn(in_features)*0.1+1.0).log_())
        self.log_delta = nn.Parameter((torch.randn(in_features)*0.1+1.0).log_())
        self.log_r     = nn.Parameter((torch.randn(in_features)*0.1+1.0).log_())
        self.eps = 0.00001

    def forward(self,x,smoother):
        alpha = self.log_alpha.exp().expand_as(x)
        delta = self.log_delta.exp().expand_as(x)
        r     = self.log_r.exp().expand_as(x)

        pcen = (x/(self.eps + smoother)**alpha + delta)**r - delta**r
        return pcen

#40 dimensional filterbank energies
pcen = PCEN(4)

#dummy data, energy is non-negative
feats = torch.randn(10,4).exp_()
smoother = torch.randn(10,4).exp_()

feats = Variable(feats)
smoother = Variable(smoother)

pcen_feats = pcen(feats,smoother)

This way, the backprop will just compute correct adjustments to the log parameters. My understanding is that they did it similarly.

I also took the liberty to generate the dummy data in pytorch directly and to make it positive with exp_. The fractional powers don’t really mix well with negative numbers (that is why your code got NaNs) and we all prefer positive energy.

Best regards

Thomas

Gautam_Bhattacharya · May 17, 2017, 10:10pm

Hey Tom,

Thank you! I think this has to be the right way to do it.

Yup, positive energy all the way. Only way to #feelthelearn

Gautam

Gautam_Bhattacharya · June 29, 2017, 8:03pm

@tom Hi Thomas,

I hope you still remember this post (not to mention see this one )
I have been experimenting with this model, and so far it does ok, but still degrades on my baseline. I still have a few optimization tricks to try.

You said:
This way, the backprop will just compute correct adjustments to the log parameters. My understanding is that they did it similarly.

Does this mean that when I do my loss.backward() an inplace log will be taken for the associated parameters, before computing their gradients?
I am just trying to check any possible loose end, though since I do get sensible results, I do think its more of an optimization issue.

Thanks,
Gautam