Custom layer with Numpy computation

Hi PyTorch community,

I am designing a network with a custom layer, in which I am using a third-party framework that requires numpy arrays. Since my layer was not behaving as expected, I tried to find the problem by drastically simplifying it.

At this point I am simply echoing the input to my layer back into the network, all parameters’ requires_grad are set to False, all custom gradients return None. So I would assume that my custom layer should have zero impact on the training, comparing to performance of the network without this layer at all. All I do is to convert a tensor to numpy and back and yet there is a palpable negative impact on training convergence.

A toy model of what I am attempting looks something like this, following the official tutorial

def forward(self, x):
        x = torch.tanh(self.fc_1(x))
        ...

        # Disassemble the batch into data points
        x_out = []
        for _, x_in in enumerate(x):
            # --- This should be the output from the custom layer ---
            x_numpy = x_in.detach().numpy()
            # -------------------------------------------------------
            x_out.append(x_in.new(x_numpy).reshape(1, -1))
        # Assemble them back into the batch
        x = torch.cat(x_out, dim=0)
        
        ...
        return torch.tanh(self.fc_n(x))

Trying the same weird construction without convertion to numpy has zero influence on training. Is there a way to convert numpy.array to torch.tensor in the middle of you network without training penalty?

If you are returning None gradients, you would detach the computation graph at this point and all preceding layers wouldn’t get any gradients, which could explain the issues you are seeing.
In case you want to write a “no-op” numpy layer, you could pass on the gradients to the preceding layer.

1 Like