Use a linear layer in-place

Is it possible to use a linear layer (with the same input and output size) in-place? I don’t care about the gradients (torch.no_grad() is enabled). I want to use as little memory as possible because I’m querying the network many thousands of times per batch item (working with 3D point clouds).

I created a method to do this by looking at the nn.Functional.linear function:

  def linear_inplace(layer, v):
    return torch.addmm(layer.bias, v, layer.weight.t(), out=v)

However, for some reason my model’s layers are not moved to the GPU if I use this method instead of the standard __call__ method of nn.Linear. I get the following error:

RuntimeError: Tensor for argument #3 'mat2' is on CPU, but expected it to be on GPU (while checking arguments for addmm)

I know my cuda device is properly selected because it works with layer(v), but not with my linear_inplace(layer, v). Can someone help me understand what’s going on here?

I’ve tracked the error to a problem with weight_norm, and opened an issue.