Parameters stuck at zero for custom layer

Hello, I’m a beginner at using pytorch, so I’m not sure if I’m approaching this correctly. I have a layer at the beginning of my network to apply an acoustic filter on the magnitude of a FFT input of an audio signal (see equation C.171 from here for more details).

Here’s what I have for my layer implementation:

class ConeFilter(torch.nn.Module):
    def __init__(self, 
        n_fft=512, 
        sample_rate=44100.0,
        steps=100):
        super().__init__()
        self.x0 = torch.nn.Parameter(torch.zeros(1))
        self.angle = torch.nn.Parameter(torch.zeros(1))
        self.depth = torch.nn.Parameter(torch.zeros(1))
        self.freq_map = torch.from_numpy(librosa.fft_frequencies(sr=sample_rate, n_fft=n_fft)).float()
        self.c = 343 # Speed of sound in air m/s
        self.rho = 1.293 # Air density kg/m^3
        self.steps = 100
        self.pi = 3.1415927

    def forward(self, noisy):
        scalar = self.c*self.rho
        result = noisy
        self.freq_map = self.freq_map.to('cuda:0')
        tot_impedance = None
        myx0 = torch.relu(self.x0) # x0 must be non-negative
        mydepth = torch.relu(self.depth) # depth must be non-negative
        myangle = ((torch.sigmoid(self.angle) + 1.0)/2.0) * (self.pi/2.0)

        for i in range(self.steps):
            x = (myx0 + (i * mydepth/self.steps)).float()
            r = torch.tan(myangle) * x
            numer = torch.mul(self.freq_map, x)
            denom = torch.add(numer, self.c)
            frac = torch.div(numer,denom)
            
            impedance = frac * self.rho * self.c / (self.pi * r * r)         

            if (tot_impedance == None):
                tot_impedance = impedance
            else:
                tot_impedance = torch.mul(tot_impedance, impedance)
        tot_impedance = tot_impedance.unsqueeze(1)
        tot_impedance = tot_impedance.unsqueeze(0)
        result = torch.mul(result, tot_impedance)
        self.print_parameters()
        return result

    def print_parameters(self):
        print("x0: " + str(self.x0.data))
        print("angle: " + str(self.angle.data * self.pi / 180.0))
        print("depth: " + str(self.depth.data))

The idea is that I want the model to converge on the optimal cone that is expressed by the three terms: x0, angle, and depth. My problem is that the output of the print_parameters() function shows that the data values are always stuck at 0.0. I read on another post that accessing the .data field may break the autograd computation graph, but I’m not sure how else to verify that the weights are changing appropriately through training. Any insight would be greatly appreciated.

Just a thought:
Would it be better to use self.x0.detach().item() instead of self.x0.data here? My intuition is that when we would be printing the parameters we would first detach the parameters from gradient flow and then access the item to print. This would not interfere with the gradients.

Hmm I changed the calls in the print_parameters function as you described. Values still seem to be stuck at 0. I was concerned about looping within a forward pass, and if that would affect the computation graph. Would this be a problem?

Interesting. I don’t think looping should create a problem. Can you initialize the parameters to something other than 0 and check?

I would also suggest to initialize parameters with random values as the zero init could cause the training to fail.

1 Like

Oh, is that something I would do when I instantiate the parameter in the init class? Do you have a pointer to documentation/could you provide a short code snippet on how to do that?

Also, as an aside, although the ConeFilter is a layer created in my networks torch.nn.ModuleList, when I send the net to the GPU with the following:

net = Network(
                    args.threshold,
                    args.tau_grad,
                    args.scale_grad,
                    args.dmax,
                    args.out_delay)
        net = torch.nn.DataParallel(net.to(device), device_ids=args.gpu)

The freq_map remains on the CPU, so I have to manually send it to the GPU at the beginning of every forward pass (self.freq_map = self.freq_map.to('cuda:0')). Is there any way around this?

You could use any factory method, e.g. torch.randn, or you could also initialize it via torch.nn.init methods.

self.freq_map is a plain tensor and thus not registered to the module.
I assume you don’t want to train this tensor, so register it as a buffer via self.register_buffer and the to() calls will move it to the GPU as well.

Hm, so I used torch.randn(1) in the Parameter instantiation statements, and now I can see that the initial values of the parameters are no longer 0.0. But, the problem remains that the values of the parameters are stuck and don’t change as training progresses. I’m using the RAdam optimizer module. Below is a simplified version of the code I’m running in each training iteration.

denoised_abs = net(noisy_abs)
clean_rec = stft_mixer(denoised_abs, noisy_arg, args.n_fft)
score = si_snr(clean_rec, clean)
loss = lam * F.mse_loss(denoised_abs, clean_abs) + (100 - torch.mean(score))
if torch.isnan(loss).any():
                loss[torch.isnan(loss)] = 0
assert torch.isnan(loss) == False

optimizer.zero_grad()
loss.backward()
torch.nn.utils.clip_grad_norm_(net.parameters(), args.clip)
optimizer.step()

P.S. Thanks for the tip with register_buffer, that part seems to be working better now!